Mirco Wahab wrote: > Hi, are you the A.Dalke from the Schulten group (VMD) as > listed here: http://www.ks.uiuc.edu/Overview/People/former.cgi
Yes. But I left there nearly a decade ago. > # naive regex '\d+9' > # find some number only if it ends by 9 > my $str="1009900000000000000000000000000000000000"; > > print "non-atomic backtracks: $1\n" if $str =~ / ( \d+9 )/x; > print "atomic: matches $1\n" if $str =~ / ( (?>\d+)9 )/x; What makes me less than enthusiastic about various new regexp features is that they can be written other ways. For example, >>> import re >>> pat = re.compile("([0-8]*9)+") >>> pat.match("1009900000000000000000000000000000000000") <_sre.SRE_Match object at 0x590a0> >>> _.group(0) '10099' >>> pat = re.compile("([0-8]*9)+(?!\d)") >>> pat.match("1009900000000000000000000000000000000000") >>> They are different ways to reach the same goal. Understanding the alternatives requires some non-trivial understanding about how regexp engines work. The diversity of solutions with no clear preference of one over the other overly promotes idiosyncratic patterns and requires that everyone reading the regexps must understand all the special syntax contructions even when a person's working vocabulary only uses a subset of those. > Second line would not backtrack and not find the 9 behind \d+ > because atomic grouping prevents that, so line 1 matches > 10099 and line 2 nothing (but very fast, which is the point). While my pattern is not as fast. It's linearly slower than yours as it does the backtracking. (Unless the engine is very smart.) That factor is not something I'm concerned with. I can also get performance without backtracking (though there is setup for backtracking) using >>> pat = re.compile("([0-8]*9)+([0-8]*)") >>> m = pat.match("1009900000000000000000000000000000000000") >>> if m.group(2): print "does not end in a 9" ... does not end in a 9 >>> >>> m = pat.match("10099000000000000000000000000000000000009a") >>> if m.group(2): print "does not end in a 9" ... >>> > Why is there a named group count > restriction in Python? Any reasons > one should know? While not definite, I think it introduces an ambiguity with octal escape codes. (*That's* something to omit in Python 3000!) Here's the quote from perlre There is no limit to the number of captured substrings that you may use. However Perl also uses \10, \11, etc. as aliases for \010, \011, etc. (Recall that 0 means octal, so \011 is the character at number 9 in your coded character set; which would be the 10th character, a hori- zontal tab under ASCII.) Perl resolves this ambiguity by interpreting \10 as a backreference only if at least 10 left parentheses have opened before it. Likewise \11 is a backreference only if at least 11 left parentheses have opened before it. And so on. \1 through \9 are always interpreted as backreferences. Python doesn't have "\10", "\11", etc. as aliases for "\010", "\011", etc. >>> re.compile("\10") <_sre.SRE_Pattern object at 0x575c0> >>> pat = re.compile(r"\10") Traceback (most recent call last): File "<stdin>", line 1, in ? File "/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/sre.py", line 180, in compile return _compile(pattern, flags) File "/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/sre.py", line 227, in _compile raise error, v # invalid expression sre_constants.error: bogus escape: '\\10' >>> pat = re.compile(r"\010") >>> so there is no ambiguity until reaching \100. >>> pat = re.compile(r"\100") >>> pat.match("@") <_sre.SRE_Match object at 0x232560> >>> Most likely /F (who implemented the regex engine) decided to refuse the temptation to guess the meaning, eg by parens counting as Perl does. Try this variation of your program to see how Perl changes the meaning of "\100" based on the input string for (my $i=98; $i<=103; $i++) { my $re = '(\d+)\D' x $i . '\100'; my $text = join ' ', 0..$i; $text .= " @"; print "Testing $i\n"; if ($text =~ /$re/) { print " Number of matches: $+\n"; print " Last group: ${$i}\n"; } else { print "No matches\n"; } } I get Testing 98 Number of matches: 98 Last group: 98 Testing 99 Number of matches: 99 Last group: 99 Testing 100 No matches Testing 101 No matches Testing 102 No matches Testing 103 No matches Andrew [EMAIL PROTECTED] -- http://mail.python.org/mailman/listinfo/python-list