On Apr 27, 1:33 am, Paul McGuire <[EMAIL PROTECTED]> wrote: > On Apr 27, 1:33 am, proctor <[EMAIL PROTECTED]> wrote: > > > > > hello, > > > i have a regex: rx_test = re.compile('/x([^x])*x/') > > > which is part of this test program: > > > ============ > > > import re > > > rx_test = re.compile('/x([^x])*x/') > > > s = '/xabcx/' > > > if rx_test.findall(s): > > print rx_test.findall(s) > > > ============ > > > i expect the output to be ['abc'] however it gives me only the last > > single character in the group: ['c'] > > > C:\test>python retest.py > > ['c'] > > > can anyone point out why this is occurring? i can capture the entire > > group by doing this: > > > rx_test = re.compile('/x([^x]+)*x/') > > but why isn't the 'star' grabbing the whole group? and why isn't each > > letter 'a', 'b', and 'c' present, either individually, or as a group > > (group is expected)? > > > any clarification is appreciated! > > > sincerely, > > proctor > > As Josiah already pointed out, the * needs to be inside the grouping > parens. > > Since re's do lookahead/backtracking, you can also write: > > rx_test = re.compile('/x(.*?)x/') > > The '?' is there to make sure the .* repetition stops at the first > occurrence of x/. > > -- Paul
i am working through an example from the oreilly book mastering regular expressions (2nd edition) by jeffrey friedl. my post was a snippet from a regex to match C comments. every 'x' in the regex represents a 'star' in actual usage, so that backslash escaping is not needed in the example (on page 275). it looks like this: =========== /x([^x]|x+[^/x])*x+/ it is supposed to match '/x', the opening delimiter, then ( either anything that is 'not x', or, 'x' one or more times, 'not followed by a slash or an x' ) any number of times (the 'star') followed finally by the closing delimiter. =========== this does not seem to work in python the way i understand it should from the book, and i simplified the example in my first post to concentrate on just one part of the alternation that i felt was not acting as expected. so my question remains, why doesn't the star quantifier seem to grab all the data. isn't findall() intended to return all matches? i would expect either 'abc' or 'a', 'b', 'c' or at least just 'a' (because that would be the first match). why does it give only one letter, and at that, the /last/ letter in the sequence?? thanks again for replying! sincerely, proctor -- http://mail.python.org/mailman/listinfo/python-list