Am 21.06.2011 22:19, schrieb Tomas Kuliavas: > 2011.06.21 20:51 Reindl Harald rašė: >>> utf-8 is strict format. If you expect utf-8 and someone submits >>> something >>> else, you can tell that without any string function. You can verify >>> utf-8 >>> strings in pcre. You can convert nbspace to regular space, if you want. >>> utf-8 does not have any byte sequence that can collide with nbspace byte >>> sequence in utf-8 >> >> show me a practicable way to detect if some input data contains UTF8 >> mb_string-functions are out of the game because there are many servers >> even of real big companies where they are not available > > :) I've said pcre and not mbstring. If you read fine utf-8 manual like I > did about 8 years ago, you would know how to detect 8bit inputs that are > not in utf-8. utf-8 is variable byte length character set which has very > specific rules about the way bytes are arranged. You can tell length of > symbol in bytes based on first byte. You can tell what kind of byte values > should be used for second, third, fourth, fifth or sixth byte. If you > eliminate five valid utf-8 8bit byte sequences and still have 8bit data, > it is not utf-8
i do not understand any word and miss a simple str_is_utf8() or call it as you like which can do this native and performant on a given variable and would offer the possibility to stop a script with not expected input without degrade performance
signature.asc
Description: OpenPGP digital signature