> and why this will not return true if $str is ISO-8859-1? For lower 7 bit characters (code points <= 127) it would return true. But if there is a single higher character (outside of ascii), it would only return true if the byte sequences follow UTF-8 semantics. So it would return false if ISO-8859-1.
For example, character é is 0xe9 (code point 234) in ISO-8859, but character 0xc3a9 in UTF-8. So if it encountered a byte stream such as 0xe92041 ("é A"), it knows it cannot be UTF-8 since 0xe920 is not a valid byte sequence. But if it saw 0xc3a92041, ("é A"), it knows it is valid UTF-8 (it could be another character set, but it is valid in UTF-8)... Please note that it's not checking if the string **is** UTF-8, just if the byte sequences in the string are valid when interpreted as UTF-8. You could have the Latin-1 string 0xc3a92041: ("é A") which parses as valid UTF-8... On Wed, Jun 22, 2011 at 9:40 AM, Reindl Harald <h.rei...@thelounge.net> wrote: > > > Am 22.06.2011 15:30, schrieb Gustavo Lopes: >> Em Wed, 22 Jun 2011 13:21:10 +0100, Reindl Harald <h.rei...@thelounge.net> >> escreveu: >> >>> Am 22.06.2011 14:14, schrieb Gustavo Lopes: >>>> It's actually 3 lines: >>>> >>>> function str_is_utf8($str) { >>>> return $str == "" || htmlspecialchars($str, 0, "UTF-8"); >>>> } >>> >>> >>> WTF should this do? >>> this won't return boolean >>> >> >> The reason it works is that >> 1) || coerces the operands into booleans (if they get to be evaluated) >> 2) htmlspecialchars returns "" on bad input sequence >> 3) (bool) "" === false >> >> But even if you didn't know these things, you should have bothered to at >> least test it >> before sending this response > > and why this will not return true if $str is ISO-8859-1? > > -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php