ID: 41216
User updated by: DPP <paul dot dovbush at gmail dot com>
Reported By: DPP <paul dot dovbush at gmail dot com>
Status: Open
Bug Type: PCRE related
Operating System: WinXPsp2
PHP Version: 5.2.1
New Comment:
Forgot to say: file contain russian text encoded in UTF-8.
Without PCRE_UTF8 modifier regexp falls on russian letter "R".
Previous Comments:
------------------------------------------------------------------------
[2007-04-27 17:26:59] DPP <paul dot dovbush at gmail dot com>
Description:
------------
Parsing file with 10000 lines of following format:
level + delim + [EMAIL PROTECTED]@ + delim +] tag + [delim + line_value +]
terminator
level digit
delim space
xref_id alphanum
tag alpha (english)
line_value any (except terminator)
terminator \r\n
With regexp:
$c=preg_match_all("/^\s*(\d+)\s+(@(\S+)@\s+)?(\w+)(\s+@(\S+)@\s*|.*)?$/Sm",$fp,$m,PREG_PATTERN_ORDER);
Setting PCRE_UTF8 modifier slows whole script down 30 times (from 300ms
to 9000ms).
May be more accurate regexp here will be
$c=preg_match_all("/^ *(\d+) +(@([EMAIL PROTECTED])@ +)?([^ \\n]+)(
+@([EMAIL PROTECTED])@ *| +[^\\n]*)?$/m",$fp,$m,PREG_PATTERN_ORDER);
But it changes nothing.
------------------------------------------------------------------------
--
Edit this bug report at http://bugs.php.net/?id=41216&edit=1