static $re = '/(^|[^\\\\])\'/';
Did no one see why the regex was wrong?
I saw what the regex was. I didn't think like you that it was 'wrong'.
Once you unescape the characters in the PHP single-quoted string above
(where two backslashes count as one, and backslash-quote counts as a
quote), the actual pattern that reaches the preg_replace function is:
/(^|[^\\])'/
RegexBuddy (a windows app) explains regexes VERY VERY well.
What kind of patterns? Does it support PCRE ones?
The important bit (where the problem lies with regard to the regex) is
...
Match a single character NOT present in the list below «[^\\\\]»
A \ character «\\»
A \ character «\\»
This is not the case.
1. As above, the pattern reaching preg_replace is /(^|[^\\])'/
2. PCRE, unlike many other regular expression implementations, allows
backslash-escaping inside character classes (square brackets). So the
doubled backslash only actually counts as a single backslash character
to be excluded from the set of characters the atom will match.
There is no error here. (And even if there were two backslashes being
excluded, of course, it wouldn't hurt anything or change the meaning of
the pattern.)
The issue is the word _single_.
I don't think anybody thought otherwise.
The problem was that, to a casual observer, the pattern seems to mean "a
quote which doesn't already have a backslash before it". I believe this
was its intent. (And the replacement added the 'missing' backslash.)
But the pattern doesn't mean that. It actually means "a character which
isn't a backslash, followed by a quote". This is subtly different.
And it's most noticeable when two quotes follow each other in the
subject string. In
str''str
first the pattern matches "r'" (non-backslash followed by quote), and
then it keeps searching from that point, i.e. it searches "'str". Since
this isn't the beginning of the string, and there is no quote following
a non-backslash character, there are no further matches.
Now, here is a pattern which actually means "a quote which doesn't
already have a backslash before it" which is achieved by means of a
lookbehind assertion, which, even when searching the string after the
first match, "'str", still 'looks back' on the earlier part of the
string to recognise the second quote is not preceded by a backslash and
match a second time:
/(^|(?<!\\))'/
As a PHP single-quoted string this is:
'/(^|(?<!\\\\))\'/'
Hope this helps,
Ben.
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php