Edit report at https://bugs.php.net/bug.php?id=53823&edit=1

 ID:                 53823
 Comment by:         robertbasic dot com at gmail dot com
 Reported by:        keith at chaos-realm dot net
 Summary:            preg_replace: * qualifier on unicode replace garbles
                     the string
 Status:             Verified
 Type:               Bug
 Package:            PCRE related
 Operating System:   Linux
 PHP Version:        5.3SVN-2011-01-23 (snap)
 Block user comment: N
 Private report:     N

 New Comment:

I tried my best on this one. Tested against the trunk:
svn info | grep Revision
Revision: 323476

I created a test file for this, will attach.

I ran the following with gdb:

$ gdb sapi/cgi/php-cgi

and then set a breakpoint

(gdb) break php_pcre.c:1318

finally ran the test script like:

(gdb) run run-tests.php ext/pcre/tests/bug53823.phpt

On https://gist.github.com/1904467 I c/p-ed some output from gdb, but that 
might be incorrect as I'm fairly new to all this. Anyway, lines 12 and 22 in 
that gist caught my attention.

Also, I think the same issue exists for preg_filter, too.


Previous Comments:
------------------------------------------------------------------------
[2011-01-26 08:02:54] ahar...@php.net

Verified on 5.3 and trunk.

------------------------------------------------------------------------
[2011-01-23 18:10:44] tino dot didriksen at gmail dot com

...and then I forget to change the *. Let's try that again...

These work as expected:
echo preg_replace('/[^\pL\pM]+/iu', '', 'áéíóú');
echo preg_replace('/[^\pL\pM\pN]+/iu', '', 'áéíóú');

------------------------------------------------------------------------
[2011-01-23 18:09:23] tino dot didriksen at gmail dot com

A workaround is to use + instead of *.

These work as expected:
echo preg_replace('/[^\pL\pM]*/iu', '', 'áéíóú');
echo preg_replace('/[^\pL\pM\pN]*/iu', '', 'áéíóú');

------------------------------------------------------------------------
[2011-01-23 18:04:49] keith at chaos-realm dot net

.

------------------------------------------------------------------------
[2011-01-23 18:00:57] keith at chaos-realm dot net

Description:
------------
When using the following test script to strip out all unicode except for 
letters the string becomes garbled when the * qualifier is added, the only 
surviving character that is intact is ú.

Also, if you add \pN to the exceptions it additionally preserves the ó.

Verified on 5.2,5.3 and 5.3-SNAP.


Test script:
---------------
echo preg_replace('/[^\pL\pM]*/iu', '', 'áéíóú');
or
echo preg_replace('/[^\pL\pM\pN]*/iu', '', 'áéíóú');

Expected result:
----------------
áéíóú

Actual result:
--------------
����ú
or 
���óú (if \pN is added to the exceptions).


------------------------------------------------------------------------



-- 
Edit this bug report at https://bugs.php.net/bug.php?id=53823&edit=1

Reply via email to