On 15 March 2011 12:41, Ben Schmidt <mail_ben_schm...@yahoo.com.au> wrote: >>>>> static $re = '/(^|[^\\\\])\'/'; >>> >>> Did no one see why the regex was wrong? > > I saw what the regex was. I didn't think like you that it was 'wrong'. > > Once you unescape the characters in the PHP single-quoted string above > (where two backslashes count as one, and backslash-quote counts as a > quote), the actual pattern that reaches the preg_replace function is: > > /(^|[^\\])'/ > >>> RegexBuddy (a windows app) explains regexes VERY VERY well. > > What kind of patterns? Does it support PCRE ones? >
Yep and MANY other flavours (C#, C++, Dephi, Groovy, Java, Javascript, MySQL, ...) >> The important bit (where the problem lies with regard to the regex) is >> ... >> >> Match a single character NOT present in the list below «[^\\\\]» >> A \ character «\\» >> A \ character «\\» > > This is not the case. > > 1. As above, the pattern reaching preg_replace is /(^|[^\\])'/ > > 2. PCRE, unlike many other regular expression implementations, allows > backslash-escaping inside character classes (square brackets). So the > doubled backslash only actually counts as a single backslash character > to be excluded from the set of characters the atom will match. > > There is no error here. (And even if there were two backslashes being > excluded, of course, it wouldn't hurt anything or change the meaning of > the pattern.) > >> The issue is the word _single_. > > I don't think anybody thought otherwise. > > The problem was that, to a casual observer, the pattern seems to mean "a > quote which doesn't already have a backslash before it". I believe this > was its intent. (And the replacement added the 'missing' backslash.) > > But the pattern doesn't mean that. It actually means "a character which > isn't a backslash, followed by a quote". This is subtly different. > > And it's most noticeable when two quotes follow each other in the > subject string. In > > str''str > > first the pattern matches "r'" (non-backslash followed by quote), and > then it keeps searching from that point, i.e. it searches "'str". Since > this isn't the beginning of the string, and there is no quote following > a non-backslash character, there are no further matches. > > Now, here is a pattern which actually means "a quote which doesn't > already have a backslash before it" which is achieved by means of a > lookbehind assertion, which, even when searching the string after the > first match, "'str", still 'looks back' on the earlier part of the > string to recognise the second quote is not preceded by a backslash and > match a second time: > > /(^|(?<!\\))'/ > > As a PHP single-quoted string this is: > > '/(^|(?<!\\\\))\'/' > > Hope this helps, > > Ben. > > > > If I say ... <?php echo '/(^|[^\\\\])\'/'; ?> I get ... /(^|[^\\])'/ which is explained as ... (^|[^\\])' Options: case insensitive; ^ and $ match at line breaks Match the regular expression below and capture its match into backreference number 1 «(^|[^\\])» Match either the regular expression below (attempting the next alternative only if this one fails) «^» Assert position at the beginning of a line (at beginning of the string or after a line break character) «^» Or match regular expression number 2 below (the entire group fails if this one fails to match) «[^\\]» Match any character that is NOT a \ character «[^\\]» Match the character “'” literally «'» And that certainly makes a LOT more sense. Decoding regexes and handling the escaping needed for the language is a real headache sometimes. Just imagine creating regex code for use by client side Javascript using PHP. 8 \ in a row for a single \ wouldn't be impossible. Sorry for the confusion. -- Richard Quadling Twitter : EE : Zend @RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php