ID:               40395
 User updated by:  jfrim at idirect dot com
 Reported By:      jfrim at idirect dot com
 Status:           Assigned
 Bug Type:         PCRE related
 Operating System: *
 PHP Version:      *
 Assigned To:      andrei
 New Comment:

I have verifed that along with 0x00 being escaped, 0x22 (the
double-quote character) is also escaped.  No other byte values are
affected.

Even if the documentation was changed to reflect this escaped behaviour
of 0x00 and 0x22, there would still be a bug with this behaviour since
0x5C (the backslash character) is NOT escaped!

This would create a discrepency problem if the input string to a
preg_replace() contained a literal backslash followed by a number zero,
or a backslash followed by a double-quote.  There would be no way to
tell from the resulting preg_replace'd data if those sequences are
escaped NULLs and escaped double-quotes, or if those were literal
sequences in the input string.

So the only way to fix this bug is to either...
...A: Escape the backslash as well, and change the documentation to
state that 0x00, 0x22, and 0x5C are escaped, or...
...B: Do not escape any characters.

I would say method B is preferred, since no stripslashes() would have
to be performed on the resulting output from a preg_replace(), and it's
far more intuitive to always know that a regular expression
back-reference will always contain the exact byte value that was
matched, without having to worry about special exceptions.


Previous Comments:
------------------------------------------------------------------------

[2007-02-08 13:17:59] [EMAIL PROTECTED]

Ok, so the problem here is that preg_do_eval() calls
php_addslashes_ex(), that escapes "'", "\" and "\0".
So we should either not escape the \0 or reflect the behaviour in the
docs.
Assigning to the extension maintainer.

------------------------------------------------------------------------

[2007-02-08 06:01:32] jfrim at idirect dot com

I'd also like to present bug #16590:

http://bugs.php.net/bug.php?id=16590

Note the following example they list as a SOLUTION to specifying NULLs
in the pattern:

preg_match("/\\x00/", "foo\0bar")

And note the following statement from bug report #16590:

"...The docs state that PCRE is binary safe..."


So if PCRE is binary safe, and you can specify NULLs in the pattern
with \x00, why are back references unable to return these matched
NULLs?!?!?

How is this NOT a bug?!??

------------------------------------------------------------------------

[2007-02-08 05:32:20] jfrim at idirect dot com

If the regular expression were /([\x00-\xFF])/ , you would think EVERY
possible byte value would be matched.  In fact, all of them do get
matched.  However, all of them EXCEPT for byte value 0x00 is returned
in the \1 back reference.  Any 0x00 bytes are returned as two bytes,
0x5C followed by 0x30.

I have not found in any Perl regular expression documentation an
explanation for why the 0x00 byte is handled like this, so could you
please tell me why this is NOT a bug with PCRE.

Thanks.

------------------------------------------------------------------------

[2007-02-08 00:26:14] [EMAIL PROTECTED]

Sorry, but your problem does not imply a bug in PHP itself.  For a
list of more appropriate places to ask for help using PHP, please
visit http://www.php.net/support.php as this bug system is not the
appropriate forum for asking support questions.  Due to the volume
of reports we can not explain in detail here why your report is not
a bug.  The support channels will be able to provide an explanation
for you.

Thank you for your interest in PHP.



------------------------------------------------------------------------

[2007-02-08 00:04:01] jfrim at idirect dot com

Description:
------------
The PERL-compatible regular expression engine is unable to output NULL
characters correctly.  This is evident with the preg_replace() function
(tested), and seems likely evident with other PCRE functions (untested)
according to some other but reports already submitted.  Instead of
returning a NULL character, a literal '\0' sequence is returned.


Reproduce code:
---------------
<?php
$inputstring = "ASCII NUL\0, SOH\01, STX\02, ETX\03";
echo
preg_replace('/([\\x00-\\x02])/e',"'['.ord('\\1').']'",$inputstring);
?>

Expected result:
----------------
ASCII NUL[0], SOH[1], STX[2], ETX

(Note that "ETX" is immediately followed by ctrl char #3)


Actual result:
--------------
ASCII NUL[92], SOH[1], STX[2], ETX

(Note that "ETX" is immediately followed by ctrl char #3)

The "92" is present in place of what should be "0" because
preg_replace() incorrectly returns a literal '\0' sequence instead of a
NULL character, and the ord() function then returns the value of the
literal backslash.



------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=40395&edit=1

Reply via email to