From:             masakielastic at gmail dot com
Operating system: All
PHP version:      5.5.0
Package:          mbstring related
Bug Type:         Feature/Change Request
Bug description:new function for replacing ill-formd byte sequences with 
substitute characters

Description:
------------
New function for replacing ill-formd byte sequences with substitute
characters 
is needed. The problem using mb_convert_encoding for that purpose is that
the 
function name doesn't represent the intent.Specfying same encoding twice is

verbose and can be interpreted as meaningless conversion for the beginners.


$str = mb_convert_encoding($str, 'UTF-8', 'UTF-8');

The case study can be seen in Ruby. Ruby 2.1 introduces String#scrub.

http://bugs.ruby-lang.org/issues/6752
https://github.com/ruby/ruby/blob/1e8a05c1dfee94db9b6b825097e1d192ad32930a/strin
g.c#L7770-L7783

The debate whether the substitute character can be specified or not is
needed.

function mb_scrub($str, $encoding = '', $substitute = '')
{
    if ('' === $encoding) {

        $encoding = mb_internal_encoding();

    }

    if ('' === $substutute) {

        $ret = mb_convert_encoding($str, $encoding, $encoding);
       
    } else {

        $before_substitute = mb_substitute_character();
        mb_substitute_character($substitute);
        $ret = mb_convert_encoding($str, $encoding, $encoding);
        mb_substitute_character($before_substitute);

    }

    return $ret;
}

This discussion can be applied to Uconverter.

function uconverter_scrub($str, $encoding, $opts = '')
{
    if ('' === $opts) {
        return UConverter::transcode($str, $encoding, $encoding, $opts);
    } else {
        return UConverter::transcode($str, $encoding, $encoding);
    }
}

The discussion for standard string functions and filter functions may be
needed 
since htmlspecialchars can be used for that purpose.

function str_scrub($str, $encoding = 'UTF-8')
{
    return htmlspecialchars_decode(htmlspecialchars($str, ENT_SUBSTITUTE, 
$encoding));
}


-- 
Edit bug report at https://bugs.php.net/bug.php?id=65081&edit=1
-- 
Try a snapshot (PHP 5.4):   
https://bugs.php.net/fix.php?id=65081&r=trysnapshot54
Try a snapshot (PHP 5.3):   
https://bugs.php.net/fix.php?id=65081&r=trysnapshot53
Try a snapshot (trunk):     
https://bugs.php.net/fix.php?id=65081&r=trysnapshottrunk
Fixed in SVN:               https://bugs.php.net/fix.php?id=65081&r=fixed
Fixed in release:           https://bugs.php.net/fix.php?id=65081&r=alreadyfixed
Need backtrace:             https://bugs.php.net/fix.php?id=65081&r=needtrace
Need Reproduce Script:      https://bugs.php.net/fix.php?id=65081&r=needscript
Try newer version:          https://bugs.php.net/fix.php?id=65081&r=oldversion
Not developer issue:        https://bugs.php.net/fix.php?id=65081&r=support
Expected behavior:          https://bugs.php.net/fix.php?id=65081&r=notwrong
Not enough info:            
https://bugs.php.net/fix.php?id=65081&r=notenoughinfo
Submitted twice:            
https://bugs.php.net/fix.php?id=65081&r=submittedtwice
register_globals:           https://bugs.php.net/fix.php?id=65081&r=globals
PHP 4 support discontinued: https://bugs.php.net/fix.php?id=65081&r=php4
Daylight Savings:           https://bugs.php.net/fix.php?id=65081&r=dst
IIS Stability:              https://bugs.php.net/fix.php?id=65081&r=isapi
Install GNU Sed:            https://bugs.php.net/fix.php?id=65081&r=gnused
Floating point limitations: https://bugs.php.net/fix.php?id=65081&r=float
No Zend Extensions:         https://bugs.php.net/fix.php?id=65081&r=nozend
MySQL Configuration Error:  https://bugs.php.net/fix.php?id=65081&r=mysqlcfg

Reply via email to