Edit report at https://bugs.php.net/bug.php?id=65081&edit=1

 ID:                 65081
 Updated by:         a...@php.net
 Reported by:        masakielastic at gmail dot com
 Summary:            new function for replacing ill-formd byte sequences
                     with substitute characters
 Status:             Open
 Type:               Feature/Change Request
 Package:            mbstring related
 Operating System:   All
 PHP Version:        5.5.0
 Block user comment: N
 Private report:     N

 New Comment:

related to bug #65045 .


Previous Comments:
------------------------------------------------------------------------
[2013-06-21 03:20:55] masakielastic at gmail dot com

Description:
------------
New function for replacing ill-formd byte sequences with substitute characters 
is needed. The problem using mb_convert_encoding for that purpose is that the 
function name doesn't represent the intent.Specfying same encoding twice is 
verbose and can be interpreted as meaningless conversion for the beginners. 

$str = mb_convert_encoding($str, 'UTF-8', 'UTF-8');

The case study can be seen in Ruby. Ruby 2.1 introduces String#scrub.

http://bugs.ruby-lang.org/issues/6752
https://github.com/ruby/ruby/blob/1e8a05c1dfee94db9b6b825097e1d192ad32930a/strin
g.c#L7770-L7783

The debate whether the substitute character can be specified or not is needed.

function mb_scrub($str, $encoding = '', $substitute = '')
{
    if ('' === $encoding) {

        $encoding = mb_internal_encoding();

    }

    if ('' === $substutute) {

        $ret = mb_convert_encoding($str, $encoding, $encoding);
       
    } else {

        $before_substitute = mb_substitute_character();
        mb_substitute_character($substitute);
        $ret = mb_convert_encoding($str, $encoding, $encoding);
        mb_substitute_character($before_substitute);

    }

    return $ret;
}

This discussion can be applied to Uconverter.

function uconverter_scrub($str, $encoding, $opts = '')
{
    if ('' === $opts) {
        return UConverter::transcode($str, $encoding, $encoding, $opts);
    } else {
        return UConverter::transcode($str, $encoding, $encoding);
    }
}

The discussion for standard string functions and filter functions may be needed 
since htmlspecialchars can be used for that purpose.

function str_scrub($str, $encoding = 'UTF-8')
{
    return htmlspecialchars_decode(htmlspecialchars($str, ENT_SUBSTITUTE, 
$encoding));
}



------------------------------------------------------------------------



-- 
Edit this bug report at https://bugs.php.net/bug.php?id=65081&edit=1

Reply via email to