Edit report at http://bugs.php.net/bug.php?id=52333&edit=1

 ID:               52333
 Updated by:       ahar...@php.net
 Reported by:      a dot dobkin at drweb dot com
 Summary:          Metacharacter \d in a regexp causes an error on some
                   Russian letters
-Status:           Open
+Status:           Bogus
 Type:             Bug
 Package:          *Regular Expressions
 Operating System: Windows
 PHP Version:      5.3.2

 New Comment:

This is an encoding issue, rather than a bug in PHP itself: by

default, preg_match() works like most things in PHP and just treats

strings as a series of bytes. If Василий is encoded in UTF-16,

there are multiple bytes in the range that are digits in ASCII, so

\d matches them.



preg_match() does have support for Unicode text when it's encoded

as UTF-8 via the /u modifier, so the right way to handle this would

be using iconv() or mb_convert_encoding() to convert the string to

UTF-8, then using a regex like

"/[\d...@\#\%\$\^&*\(\)\~\=\/\|\"\'\?\:\;\/]+/u" to force UTF-8

mode.


Previous Comments:
------------------------------------------------------------------------
[2010-07-14 07:32:43] a dot dobkin at drweb dot com

OS 2003 Server R2 SP2 English x86

------------------------------------------------------------------------
[2010-07-14 07:03:15] a dot dobkin at drweb dot com

Description:
------------
Metacharacter \d in a regular expression causes an error on some Russian
letters 

on OS Windows. 



Example script:



$user_name_ru = "Василий";

$regexp = "/[\d...@\#\%\$\^&*\(\)\~\=\/\|\"\'\?\:\;\/]+/";

if( preg_match( $regexp,$user_name_ru ) ) {

 echo 'ERR';

} else {

 echo 'OK';

}



preg_match() return true if word contains one or more characters 'й',
'г', 'в'. 

If to delete metacharacter '\d' preg_match() returns false.   If you are
using 

php version 5.2.13 all works correctly.



------------------------------------------------------------------------



-- 
Edit this bug report at http://bugs.php.net/bug.php?id=52333&edit=1

Reply via email to