2011.06.21 20:51 Reindl Harald rašė: >> utf-8 is strict format. If you expect utf-8 and someone submits >> something >> else, you can tell that without any string function. You can verify >> utf-8 >> strings in pcre. You can convert nbspace to regular space, if you want. >> utf-8 does not have any byte sequence that can collide with nbspace byte >> sequence in utf-8 > > show me a practicable way to detect if some input data contains UTF8 > mb_string-functions are out of the game because there are many servers > even of real big companies where they are not available
:) I've said pcre and not mbstring. If you read fine utf-8 manual like I did about 8 years ago, you would know how to detect 8bit inputs that are not in utf-8. utf-8 is variable byte length character set which has very specific rules about the way bytes are arranged. You can tell length of symbol in bytes based on first byte. You can tell what kind of byte values should be used for second, third, fourth, fifth or sixth byte. If you eliminate five valid utf-8 8bit byte sequences and still have 8bit data, it is not utf-8. -- Tomas -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php