> Enjoyed Andrei's talk at the NYPHP Conference last week about unicode in > PHP 6. He mentioned that when unicode.semantics is on, strlen() will > return the number of characters rather than the number of bytes, like > mb_string() does or strlen() if mbstring.func_overload is on. > > The hitch here is there are situations where one needs to know how many > bytes are in a string. Is there a function I've overlooked that does > this or will do this, please? > My first question is: Why do you need to know the number of bytes occupied by a textual string? Is it because you want to work with binary strings? Because that's still very possible:
Even with unicode.semantics=on, the binary string type may be explicitly used in a few ways: $a = b"This string contains an 0xF0 byte: \xF0"; $alen = strlen($a); This being the simplest, the lowercase b (or u) characters denote a string as being a binary (or unicode) string explicitly. Leaving these specifiers off yield whatever type is appropriate to unicode.semantics. In other cases, such as reading from a binary mode file: $fp = fopen('foo.bin', 'rb'); $str = fread($fp, 100); The string returned is always returned as a binary string regardless of unicode semantics. When reading a text-mode file conversely: $fp = fopen('foo.txt', 'rt'); $str = fread($fp, 100); The type of string returned will depend on the unicode.semantics switch (in order to ensure maximum BC, since scripts designed for windows already use text mode to handle linebreak transformation). -Sara -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php