Edit report at https://bugs.php.net/bug.php?id=54223&edit=1
ID: 54223 Comment by: danielklein at airpost dot net Reported by: carsten_sttgt at gmx dot de Summary: enhance / change newline behavior Status: Open Type: Feature/Change Request Package: PCRE related PHP Version: Irrelevant Block user comment: N Private report: N New Comment: This is very old, well defined behaviour. /$/ matches /(?=\n?(?![\w\W]))/, that is a just before a newline with no remaining characters or end-of-string (no more characters). What you are asking for is /(?=\r?\n?(?![\w\W]))/ (or /$/m => /(?=\r\n?|\n)/) unless you're asking for ONLY \r\n to match! This would also have to change the meaning of /./ (without /s) from /[^\n]/ to /[^\r\n]/. This would be confusing for people used to the old way of doing things. Would this mean that /^/m would match just after a \r? (but not in between a \r and a \n???) It would be better to specify what you are matching or not matching rather than changing the behaviour of carriage returns. I agree this can make regexes more difficult to write and understand in some instances but I would vote to stay with the status quo as I believe adding such an option would add to the total confusion rather than decrease it. If you write /[^\r\n]+/ it is perfectly clear to anyone who understands character classes. Note that the difference between /.+/ and /.+/s is already subtle and easily missed. Another alternative is str_replace("\r\n", "\n", $input) before using the regex which will convert Windows style line endings to Unix style line endings. Previous Comments: ------------------------------------------------------------------------ [2011-03-11 11:23:30] carsten_sttgt at gmx dot de Description: ------------ At the moment (PHP bundled) PCRE is build with the default "#define NEWLINE 10". As a result "$" means "\n" but not e.g. "\r\n" and "\r" remains as part of the match. This is unexpected for some people, especially on Windows. See the example below. There are 3 solutions: 1) Building PCRE with "#define NEWLINE -2", or with "#define NEWLINE -1" (because PCRE is still build with Unicode support) 2) Adding a INI option like "pcre.newline=any" 3) Making PCRE_NEWLINE_ANY, PCRE_NEWLINE_CR, PCRE_NEWLINE_CRLF, PCRE_NEWLINE_LF available to the userland (maybe as pattern modifier), like you can do this with C using the PCRE library. (Well, 1) is not essential if 2) and 3) is available) Test script: --------------- <?php $str = "line1\r\nline2\r\nline3\r\n"; preg_match_all('/.+/', $str, $res); var_dump($res); ?> Expected result: ---------------- array(1) { [0]=> array(3) { [0]=> string(5) "line1" [1]=> string(5) "line2" [2]=> string(5) "line3" } } Actual result: -------------- array(1) { [0]=> array(3) { [0]=> " string(6) "line1 [1]=> " string(6) "line2 [2]=> " string(6) "line3 } } ------------------------------------------------------------------------ -- Edit this bug report at https://bugs.php.net/bug.php?id=54223&edit=1