From: phpwnd at gmail dot com Operating system: PHP version: 5.3CVS-2009-02-28 (CVS) PHP Bug Type: PCRE related Bug description: PCRE fails on Unicode surrogates
Description: ------------ According to http://docs.php.net/manual/en/regexp.reference.php PCRE functions should be able to match surrogates in Unicode mode. However, it is my understanding that surrogates are not allowed in UTF-8, which is the encoding used by the Unicode mode. That would explain why preg_match() and preg_replace() fail when operating on UTF-8-encoded surrogates. Note that both functions fail in a different way. preg_match() returns 0 whereas preg_replace() returns NULL. I'm not sure what the fix should be. Being able to match surrogates would make my life easier, but if it's not valid UTF-8 then it might be more consistent (albeit in a twisted way) to return NULL, as that's what PCRE functions do on invalid UTF-8. Reproduce code: --------------- // \xED\xA0\x80 is character 0xD800 in UTF-8 var_dump(preg_match('#.#u', ".\xED\xA0\x80")); var_dump(preg_replace('#\p{Cs}#u', '', ".\xED\xA0\x80")); Expected result: ---------------- int(1) string(1) "." Actual result: -------------- int(0) NULL -- Edit bug report at http://bugs.php.net/?id=47526&edit=1 -- Try a CVS snapshot (PHP 5.2): http://bugs.php.net/fix.php?id=47526&r=trysnapshot52 Try a CVS snapshot (PHP 5.3): http://bugs.php.net/fix.php?id=47526&r=trysnapshot53 Try a CVS snapshot (PHP 6.0): http://bugs.php.net/fix.php?id=47526&r=trysnapshot60 Fixed in CVS: http://bugs.php.net/fix.php?id=47526&r=fixedcvs Fixed in CVS and need be documented: http://bugs.php.net/fix.php?id=47526&r=needdocs Fixed in release: http://bugs.php.net/fix.php?id=47526&r=alreadyfixed Need backtrace: http://bugs.php.net/fix.php?id=47526&r=needtrace Need Reproduce Script: http://bugs.php.net/fix.php?id=47526&r=needscript Try newer version: http://bugs.php.net/fix.php?id=47526&r=oldversion Not developer issue: http://bugs.php.net/fix.php?id=47526&r=support Expected behavior: http://bugs.php.net/fix.php?id=47526&r=notwrong Not enough info: http://bugs.php.net/fix.php?id=47526&r=notenoughinfo Submitted twice: http://bugs.php.net/fix.php?id=47526&r=submittedtwice register_globals: http://bugs.php.net/fix.php?id=47526&r=globals PHP 4 support discontinued: http://bugs.php.net/fix.php?id=47526&r=php4 Daylight Savings: http://bugs.php.net/fix.php?id=47526&r=dst IIS Stability: http://bugs.php.net/fix.php?id=47526&r=isapi Install GNU Sed: http://bugs.php.net/fix.php?id=47526&r=gnused Floating point limitations: http://bugs.php.net/fix.php?id=47526&r=float No Zend Extensions: http://bugs.php.net/fix.php?id=47526&r=nozend MySQL Configuration Error: http://bugs.php.net/fix.php?id=47526&r=mysqlcfg
