Edit report at https://bugs.php.net/bug.php?id=54223&edit=1

 ID:                 54223
 Comment by:         danielklein at airpost dot net
 Reported by:        carsten_sttgt at gmx dot de
 Summary:            enhance / change newline behavior
 Status:             Open
 Type:               Feature/Change Request
 Package:            PCRE related
 PHP Version:        Irrelevant
 Block user comment: N
 Private report:     N

 New Comment:

This is very old, well defined behaviour.

/$/ matches /(?=\n?(?![\w\W]))/, that is a just before a newline with no 
remaining characters or end-of-string (no more characters). What you are asking 
for is /(?=\r?\n?(?![\w\W]))/ (or /$/m => /(?=\r\n?|\n)/) unless you're asking 
for ONLY \r\n to match! This would also have to change the meaning of /./ 
(without /s) from /[^\n]/ to /[^\r\n]/. This would be confusing for people used 
to the old way of doing things. Would this mean that /^/m would match just 
after a \r? (but not in between a \r and a \n???)

It would be better to specify what you are matching or not matching rather than 
changing the behaviour of carriage returns. I agree this can make regexes more 
difficult to write and understand in some instances but I would vote to stay 
with the status quo as I believe adding such an option would add to the total 
confusion rather than decrease it. If you write /[^\r\n]+/ it is perfectly 
clear to anyone who understands character classes. Note that the difference 
between /.+/ and /.+/s is already subtle and easily missed.

Another alternative is str_replace("\r\n", "\n", $input) before using the regex 
which will convert Windows style line endings to Unix style line endings.


Previous Comments:
------------------------------------------------------------------------
[2011-03-11 11:23:30] carsten_sttgt at gmx dot de

Description:
------------
At the moment (PHP bundled) PCRE is build with the default "#define NEWLINE 10".

As a result "$" means "\n" but not e.g. "\r\n" and "\r" remains as part of the 
match. This is unexpected for some people, especially on Windows. See the 
example below.

There are 3 solutions:
1)
Building PCRE with "#define NEWLINE -2", or with "#define NEWLINE -1" (because 
PCRE is still build with Unicode support)

2)
Adding a INI option like "pcre.newline=any"

3)
Making PCRE_NEWLINE_ANY, PCRE_NEWLINE_CR, PCRE_NEWLINE_CRLF, PCRE_NEWLINE_LF 
available to the userland (maybe as pattern modifier), like you can do this 
with C using the PCRE library.  

(Well, 1) is not essential if 2) and 3) is available)


Test script:
---------------
<?php
$str = "line1\r\nline2\r\nline3\r\n";

preg_match_all('/.+/', $str, $res);

var_dump($res);
?>


Expected result:
----------------
array(1) {
  [0]=>
  array(3) {
    [0]=>
    string(5) "line1"
    [1]=>
    string(5) "line2"
    [2]=>
    string(5) "line3"
  }
}


Actual result:
--------------
array(1) {
  [0]=>
  array(3) {
    [0]=>
"   string(6) "line1
    [1]=>
"   string(6) "line2
    [2]=>
"   string(6) "line3
  }
}



------------------------------------------------------------------------



-- 
Edit this bug report at https://bugs.php.net/bug.php?id=54223&edit=1

Reply via email to