On Fri, 09 Feb 2007 22:27:37 +0100, in php.internals
[EMAIL PROTECTED] (Lukas Kahwe Smith) wrote:

>Well I remember that Andrei once brought up the idea of an ereg wrapper 
>around pcre, but IIRC the idea was dropped because there would be tons 
>of subtle issues from edge cases. So the decision was made to simply 
>turn it into a pecl extension to be installed by the user as needed 
>without shiny new unicode support. Not sure if that means that ereg 
>would not be made to work with PHP6 at all ...

Maybe it would be time to re-check the preg arguments.

Currently the preg pattern input is a quagmire of partial perl style
and usual arguments. In perl delimiters make sense
(e.g. s/foo/bar/i , s_foo_bar_i , s(foo)(bar)i , s<foo><bar>i ). The
delimiters are an easy way to separate arguments. But in PHP we are
already separating them as ordinary function arguments, though the
separation is a bit spurios:

s/foo/bar/i would be changed to preg_replace('/foo/i','bar',$input) .

So, we separate the replace part, but not the match or the flags. We
partially pretend that we are working perl-style, but obviously, we
aren't.

For instance, "/foo\Q$bar\E/" would not work exactly as /foo\Q$bar\E/
would in perl if $bar contained an \E (preg_match() would just recieve
the interpolated string):

$ perl -le '$foo = "\\E"; $_ = " \\E "; print m/\Q $foo \E/;'
1
$ php -r '$foo = "\\E"; print preg_match("/\Q $foo \E/"," \\E ");'
0

Furthermore, the use of /e flag also creates some strange results at
first sight:

<?php
$string = <<<EOD
A "little" test and a 'little' test
EOD;
print preg_replace('_(.*)_e',"'$1'",$string)."\n";
print preg_replace('_(.*)_e','"$1"',$string)."\n";
?>

I think it confuses more than it helps that we require the delimisters
and the current arguments (preg_*-functions requre match-and-flags as
first argument, which itself contains of two arguments; the match and
the flags).

As mentioned, there is no way \Q and \E could work exactly as in perl.
The same goes to the usage of the e flag.

There are already alternatives, e.g. preg_quote() and
preg_replace_callback() to overcome these shortcomings. But I don't
think people would be encouraged to use them without first crashing
into the shortcomings of the ordinary usage with \Q, \E and /e.

Unfortunately there isn't an easy fix. PHP isn't perl; for the above
to work would require major changes in the PHP language parser. The
nice way would be to get rid of delimiters and separate the flag into
its own argument, but it would be a hell of a BC break if the input
form suddently changed.

Nonetheless the current PCRE functions leads to confusion and
weirdness as long the perl syntax is mixed with php.

-- 
- Peter Brodersen

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to