[PHP-DEV] Re: strlen() under unicode.semantics

Sara Golemon Wed, 21 Jun 2006 13:10:48 -0700

> Enjoyed Andrei's talk at the NYPHP Conference last week about unicode in
> PHP 6.  He mentioned that when unicode.semantics is on, strlen() will
> return the number of characters rather than the number of bytes, like
> mb_string() does or strlen() if mbstring.func_overload is on.
>
> The hitch here is there are situations where one needs to know how many
> bytes are in a string.  Is there a function I've overlooked that does
> this or will do this, please?
>
My first question is: Why do you need to know the number of bytes occupied
by a textual string?  Is it because you want to work with binary strings?
Because that's still very possible:


Even with unicode.semantics=on, the binary string type may be explicitly
used in a few ways:

$a = b"This string contains an 0xF0 byte: \xF0";
$alen = strlen($a);

This being the simplest, the lowercase b (or u) characters denote a string
as being a binary (or unicode) string explicitly.  Leaving these specifiers
off yield whatever type is appropriate to unicode.semantics.

In other cases, such as reading from a binary mode file:

$fp = fopen('foo.bin', 'rb');
$str = fread($fp, 100);

The string returned is always returned as a binary string regardless of
unicode semantics.  When reading a text-mode file conversely:

$fp = fopen('foo.txt', 'rt');
$str = fread($fp, 100);

The type of string returned will depend on the unicode.semantics switch (in
order to ensure maximum BC, since scripts designed for windows already use
text mode to handle linebreak transformation).

-Sara

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

[PHP-DEV] Re: strlen() under unicode.semantics

Reply via email to