Maybe it'd be useful if there was a function to "cast" a UTF string into a binary string without changing anything on the inside. That way one could do strlen(str_to_binary($string)). That would also be useful for binary storing and reading (with binary_to_str).
Ron "Andrei Zmievski" <[EMAIL PROTECTED]> schreef in bericht news:[EMAIL PROTECTED] > The only way they can get at the internal UTF-16 representation is via > unicode_encode($uni, 'UTF-16') which will return a binary UTF-16 string. > In that case, strlen() will work just as well. > > -Andrei > > > On Jun 22, 2006, at 11:30 PM, Andi Gutmans wrote: > >> I don't quite agree. I think there's a good chance people will want to >> save >> Unicode strings in a binary format for performance reasons. Save it the >> way >> it looks in memory, and put it back... Why convert to UTF-8 or any other >> encoding if it's just about storage? >> >> Andi >> >>> -----Original Message----- >>> From: Sara Golemon [mailto:[EMAIL PROTECTED] >>> Sent: Thursday, June 22, 2006 9:15 PM >>> To: "Ron Korving" >>> Cc: internals@lists.php.net >>> Subject: Re: [PHP-DEV] Re: strlen() under unicode.semantics >>> >>>> Still, it's gotta be useful to be know how many bytes it occupies. >>>> Perhaps for Content-length headers or something. There are >>> plenty of >>>> low level concepts to think of where one might need this. >>> And even if >>>> you can't think of any reason now, you don't wanna get hit >>> in the face >>>> by it and have to implement such a function for PHP 6.0.1. >>>> >>> For this type of usage, I'd think it'd be relevant to know >>> how many bytes the string will occupy in a given output >>> encoding moreso that what it happens to occupy in the >>> underlying implementation. In the example you cited, string >>> contents will more typically be sent as utf8 rather than the >>> utf16 of php's internal encoding. >>> >>> $utf8str = unicode_encode($unistr, 'utf8'); >>> >>> header('Content-type: text/html; encoding="utf8"'); >>> header('Content-length: ' . strlen($utf8str)); echo $utf8str; >>> >>> I'm not saying it's impossible that a legitimate use will >>> come up to know the internal byte-usage of a unicode string, >>> there's certainly no harm in adding such a function (apart >>> from the tired shot-foot argument). I just doubt you (or >>> anyone) will come up such a case anytime soon. >>> >>> -Sara >>> >>> -- >>> PHP Internals - PHP Runtime Development Mailing List To >>> unsubscribe, visit: http://www.php.net/unsub.php >>> >> >> -- >> PHP Internals - PHP Runtime Development Mailing List >> To unsubscribe, visit: http://www.php.net/unsub.php -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php