From:             phpwnd at gmail dot com
Operating system: 
PHP version:      5.3CVS-2009-02-28 (CVS)
PHP Bug Type:     PCRE related
Bug description:  PCRE fails on Unicode surrogates

Description:
------------
According to http://docs.php.net/manual/en/regexp.reference.php PCRE
functions should be able to match surrogates in Unicode mode. However, it
is my understanding that surrogates are not allowed in UTF-8, which is the
encoding used by the Unicode mode. That would explain why preg_match() and
preg_replace() fail when operating on UTF-8-encoded surrogates.

Note that both functions fail in a different way. preg_match() returns 0
whereas preg_replace() returns NULL.

I'm not sure what the fix should be. Being able to match surrogates would
make my life easier, but if it's not valid UTF-8 then it might be more
consistent (albeit in a twisted way) to return NULL, as that's what PCRE
functions do on invalid UTF-8.

Reproduce code:
---------------
// \xED\xA0\x80 is character 0xD800 in UTF-8
var_dump(preg_match('#.#u', ".\xED\xA0\x80"));
var_dump(preg_replace('#\p{Cs}#u', '', ".\xED\xA0\x80"));

Expected result:
----------------
int(1)
string(1) "."

Actual result:
--------------
int(0)
NULL

-- 
Edit bug report at http://bugs.php.net/?id=47526&edit=1
-- 
Try a CVS snapshot (PHP 5.2):        
http://bugs.php.net/fix.php?id=47526&r=trysnapshot52
Try a CVS snapshot (PHP 5.3):        
http://bugs.php.net/fix.php?id=47526&r=trysnapshot53
Try a CVS snapshot (PHP 6.0):        
http://bugs.php.net/fix.php?id=47526&r=trysnapshot60
Fixed in CVS:                        
http://bugs.php.net/fix.php?id=47526&r=fixedcvs
Fixed in CVS and need be documented: 
http://bugs.php.net/fix.php?id=47526&r=needdocs
Fixed in release:                    
http://bugs.php.net/fix.php?id=47526&r=alreadyfixed
Need backtrace:                      
http://bugs.php.net/fix.php?id=47526&r=needtrace
Need Reproduce Script:               
http://bugs.php.net/fix.php?id=47526&r=needscript
Try newer version:                   
http://bugs.php.net/fix.php?id=47526&r=oldversion
Not developer issue:                 
http://bugs.php.net/fix.php?id=47526&r=support
Expected behavior:                   
http://bugs.php.net/fix.php?id=47526&r=notwrong
Not enough info:                     
http://bugs.php.net/fix.php?id=47526&r=notenoughinfo
Submitted twice:                     
http://bugs.php.net/fix.php?id=47526&r=submittedtwice
register_globals:                    
http://bugs.php.net/fix.php?id=47526&r=globals
PHP 4 support discontinued:          http://bugs.php.net/fix.php?id=47526&r=php4
Daylight Savings:                    http://bugs.php.net/fix.php?id=47526&r=dst
IIS Stability:                       
http://bugs.php.net/fix.php?id=47526&r=isapi
Install GNU Sed:                     
http://bugs.php.net/fix.php?id=47526&r=gnused
Floating point limitations:          
http://bugs.php.net/fix.php?id=47526&r=float
No Zend Extensions:                  
http://bugs.php.net/fix.php?id=47526&r=nozend
MySQL Configuration Error:           
http://bugs.php.net/fix.php?id=47526&r=mysqlcfg

Reply via email to