Hello Rouven, the lack of "good" UTF-8 support is a long topic in PHP and improvement (at least i think) are very welcome at this place!
Before I write an RFC I'd like to get some feedback what you think about > adding the following functions to PHP 5.6 (possibly more to follow): > utf8_is_valid, utf8_strlen, utf8_substr, utf8_strpos, utf8_strrpos, > utf8_str_split, utf8_strrev, utf8_recover, utf8_chr, utf8_ord, > string_is_ascii. > > Most of them (exceptions are utf8_chr, utf8_is_valid, utf8_recover and > string_is_ascii) are currently written in a way that they emit a warning > when they encounter invalid UTF-8 and return with null. This should > encourage applications to check their input with utf8_is_valid and either > stop further processing or to fall back to utf8_recover to get a valid > string. This should improve security since there are attack vectors when > malformed sequences get interpreted as another encoding. > > I'm currently using the multibyte from the "mb_" functions and i'm generally happy with it. For me it's no problem with a custom webserver to use this extension. The biggest problem with the extension i had is that there is no each function from the standard string functions available. I think most famous: mb_str_replace Maybe to think off: Why not combine your things with the mb_ extension? For emmiting a warning you could use a configuration either in ini file or calling a function to set it. I would rather like one complete "mb/utf-8" lib that even one more. Like you have already written, there are already some out there....and for core i would currently preferr "mb_" because they are available since PHP4 and stable.