New submission from Mateusz Dobrowolny:

Python 3.4.1, Windows.
help(re.findall) shows me:
findall(pattern, string, flags=0)
    Return a list of all non-overlapping matches in the string.

    If one or more capturing groups are present in the pattern, return
    a list of groups; this will be a list of tuples if the pattern
    has more than one group.

    Empty matches are included in the result.

It seems like there is missing information regarding greedy groups, i.e. 
(regular_expression)*
Please take a look at my example:

-------------EXAMPLE-------------
import re

text = 'To configure your editing environment, use the Editor settings page and 
its child pages. There is also a ' \
       'Quick Switch Scheme command that lets you change color schemes, themes, 
keymaps, etc. with a couple of ' \
       'keystrokes.'
print('Text to be searched: \n' + text)
print('\nSarching method: re.findall()')

regexp_result = re.findall(r'\w+(\s+\w+)', text)
print('\nRegexp rule: r\'\w+(\s+\w+)\' \nFound: ' + str(regexp_result))
print('This works as expected: findall() returns a list of groups (\s+\w+), and 
the groups are from non-overlapping matches.')

regexp_result = re.findall(r'\w+(\s+\w+)*', text)
print('\nHow about making the group greedy? Here we go: \nRegexp rule: 
r\'\w+(\s+\w+)*\' \nFound: ' + str(regexp_result))
print('This is a little bit unexpected for me: findall() returns THE LAST 
MATCHING group only, parsing from-left-to-righ.')

regexp_result_list = re.findall(r'(\w+(\s+\w+)*)', text)
first_group = list(i for i, j in regexp_result_list)
print('\nThe solution is to put an extra group aroung the whole RE: \nRegexp 
rule: r\'(\w+(\s+\w+)*)\' \nFound: ' + str(first_group))
print('So finally I can get all strings I am looking for, just like expected 
from the FINDALL method, by accessing first elements in tuples.')
----------END OF EXAMPLE-------------


I found the solution when practicing on this page:
http://regex101.com/#python
Entering:
REGULAR EXPRESSION: \w+(\s+\w+)*
TEST STRING: To configure your editing environment, use the Editor settings 
page and its child pages. There is also a Quick Switch Scheme command that lets 
you change color schemes, themes, keymaps, etc. with a couple of keystrokes.

it showed me on the right side with nice color-coding:
1st Capturing group (\s+\w+)*
Quantifier: Between zero and unlimited times, as many times as possible, giving 
back as needed [greedy]
Note: A repeated capturing group will only capture the last iteration. Put a 
capturing group around the repeated group to capture all iterations or use a 
non-capturing group instead if you're not interested in the data




I think some information regarding repeated groups should be included as well 
in Python documentation.

BTW: I have one extra question.
Searching for 'findall' in this tracker I found this issue:
http://bugs.python.org/issue3384

It looks like information about ordering information is no longer in 3.4.1 
documentation. Shouldn't this be there?

Kind Regards

----------
assignee: docs@python
components: Documentation
messages: 226534
nosy: Mateusz.Dobrowolny, docs@python
priority: normal
severity: normal
status: open
title: re.findall() documentation lacks information about finding THE LAST 
iteration of reoeated capturing group (greedy)
versions: Python 3.4

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue22353>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to