Edit report at https://bugs.php.net/bug.php?id=65081&edit=1
ID: 65081 Updated by: a...@php.net Reported by: masakielastic at gmail dot com Summary: new function for replacing ill-formd byte sequences with substitute characters Status: Open Type: Feature/Change Request Package: mbstring related Operating System: All PHP Version: 5.5.0 Block user comment: N Private report: N New Comment: related to bug #65045 . Previous Comments: ------------------------------------------------------------------------ [2013-06-21 03:20:55] masakielastic at gmail dot com Description: ------------ New function for replacing ill-formd byte sequences with substitute characters is needed. The problem using mb_convert_encoding for that purpose is that the function name doesn't represent the intent.Specfying same encoding twice is verbose and can be interpreted as meaningless conversion for the beginners. $str = mb_convert_encoding($str, 'UTF-8', 'UTF-8'); The case study can be seen in Ruby. Ruby 2.1 introduces String#scrub. http://bugs.ruby-lang.org/issues/6752 https://github.com/ruby/ruby/blob/1e8a05c1dfee94db9b6b825097e1d192ad32930a/strin g.c#L7770-L7783 The debate whether the substitute character can be specified or not is needed. function mb_scrub($str, $encoding = '', $substitute = '') { if ('' === $encoding) { $encoding = mb_internal_encoding(); } if ('' === $substutute) { $ret = mb_convert_encoding($str, $encoding, $encoding); } else { $before_substitute = mb_substitute_character(); mb_substitute_character($substitute); $ret = mb_convert_encoding($str, $encoding, $encoding); mb_substitute_character($before_substitute); } return $ret; } This discussion can be applied to Uconverter. function uconverter_scrub($str, $encoding, $opts = '') { if ('' === $opts) { return UConverter::transcode($str, $encoding, $encoding, $opts); } else { return UConverter::transcode($str, $encoding, $encoding); } } The discussion for standard string functions and filter functions may be needed since htmlspecialchars can be used for that purpose. function str_scrub($str, $encoding = 'UTF-8') { return htmlspecialchars_decode(htmlspecialchars($str, ENT_SUBSTITUTE, $encoding)); } ------------------------------------------------------------------------ -- Edit this bug report at https://bugs.php.net/bug.php?id=65081&edit=1