New submission from Mateusz Dobrowolny: Python 3.4.1, Windows. help(re.findall) shows me: findall(pattern, string, flags=0) Return a list of all non-overlapping matches in the string.
If one or more capturing groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result. It seems like there is missing information regarding greedy groups, i.e. (regular_expression)* Please take a look at my example: -------------EXAMPLE------------- import re text = 'To configure your editing environment, use the Editor settings page and its child pages. There is also a ' \ 'Quick Switch Scheme command that lets you change color schemes, themes, keymaps, etc. with a couple of ' \ 'keystrokes.' print('Text to be searched: \n' + text) print('\nSarching method: re.findall()') regexp_result = re.findall(r'\w+(\s+\w+)', text) print('\nRegexp rule: r\'\w+(\s+\w+)\' \nFound: ' + str(regexp_result)) print('This works as expected: findall() returns a list of groups (\s+\w+), and the groups are from non-overlapping matches.') regexp_result = re.findall(r'\w+(\s+\w+)*', text) print('\nHow about making the group greedy? Here we go: \nRegexp rule: r\'\w+(\s+\w+)*\' \nFound: ' + str(regexp_result)) print('This is a little bit unexpected for me: findall() returns THE LAST MATCHING group only, parsing from-left-to-righ.') regexp_result_list = re.findall(r'(\w+(\s+\w+)*)', text) first_group = list(i for i, j in regexp_result_list) print('\nThe solution is to put an extra group aroung the whole RE: \nRegexp rule: r\'(\w+(\s+\w+)*)\' \nFound: ' + str(first_group)) print('So finally I can get all strings I am looking for, just like expected from the FINDALL method, by accessing first elements in tuples.') ----------END OF EXAMPLE------------- I found the solution when practicing on this page: http://regex101.com/#python Entering: REGULAR EXPRESSION: \w+(\s+\w+)* TEST STRING: To configure your editing environment, use the Editor settings page and its child pages. There is also a Quick Switch Scheme command that lets you change color schemes, themes, keymaps, etc. with a couple of keystrokes. it showed me on the right side with nice color-coding: 1st Capturing group (\s+\w+)* Quantifier: Between zero and unlimited times, as many times as possible, giving back as needed [greedy] Note: A repeated capturing group will only capture the last iteration. Put a capturing group around the repeated group to capture all iterations or use a non-capturing group instead if you're not interested in the data I think some information regarding repeated groups should be included as well in Python documentation. BTW: I have one extra question. Searching for 'findall' in this tracker I found this issue: http://bugs.python.org/issue3384 It looks like information about ordering information is no longer in 3.4.1 documentation. Shouldn't this be there? Kind Regards ---------- assignee: docs@python components: Documentation messages: 226534 nosy: Mateusz.Dobrowolny, docs@python priority: normal severity: normal status: open title: re.findall() documentation lacks information about finding THE LAST iteration of reoeated capturing group (greedy) versions: Python 3.4 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue22353> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com