Re: [PHP-DEV] Proposal for better UTF-8 handling

Martin Keckeis Thu, 23 May 2013 23:35:16 -0700

Hello Rouven,

the lack of "good" UTF-8 support is a long topic in PHP and improvement (at
least i think) are very welcome at this place!


Before I write an RFC I'd like to get some feedback what you think about
> adding the following functions to PHP 5.6 (possibly more to follow):
> utf8_is_valid, utf8_strlen,  utf8_substr, utf8_strpos, utf8_strrpos,
> utf8_str_split, utf8_strrev, utf8_recover, utf8_chr, utf8_ord,
> string_is_ascii.
>
> Most of them (exceptions are utf8_chr, utf8_is_valid, utf8_recover and
> string_is_ascii) are currently written in a way that they emit a warning
> when they encounter invalid UTF-8 and return with null. This should
> encourage applications to check their input with utf8_is_valid and either
> stop further processing or to fall back to utf8_recover to get a valid
> string. This should improve security since there are attack vectors when
> malformed sequences get interpreted as another encoding.
>
>
I'm currently using the multibyte from the "mb_" functions and i'm
generally happy with it. For me it's no problem with a custom webserver to
use this extension. The biggest problem with the extension i had is that
there is no each function from the standard string functions available.
I think most famous: mb_str_replace

Maybe to think off:
Why not combine your things with the mb_ extension? For emmiting a warning
you could use a configuration either in ini file or calling a function to
set it.

I would rather like one complete "mb/utf-8" lib that even one more. Like
you have already written, there are already some out there....and for core
i would currently preferr "mb_" because they are available since PHP4 and
stable.

Re: [PHP-DEV] Proposal for better UTF-8 handling

Reply via email to