On 4/21/2011 6:16 AM, Neil Cerutti wrote:
On 2011-04-20, John Nagle<na...@animats.com>  wrote:
      Findall does something a bit different. It returns a list of
matches of the entire pattern, not repeats of groups within
the pattern.

      Consider a regular expression for matching domain names:

kre = re.compile(r'^([a-zA-Z0-9\-]+)(?:\.([a-zA-Z0-9\-]+))+$')
s = 'www.example.com'
ms = kre.match(s)
ms.groups()
('www', 'com')
msall = kre.findall(s)
msall
[('www', 'com')]

This is just a simple example.  But it illustrates an unnecessary
limitation.  The matcher can do the repeated matching; you just can't
get the results out.

Thanks for the further explantion.

Assuming a fake API that returned multiple group matches as a
tuple:

? print(re.match(r"^([a-z])+$", "abcdef").groups())
(('a', 'b', 'c', 'd', 'e', 'f'),)

I was thinking of applying findall something like this, but you
have to make multiple calls:

m = re.match(r"^[a-z]+$", s)
if m:
...   print(re.findall(r"[a-z]", m.group()))
...
['a', 'b', 'c', 'd', 'e', 'f']

I can see that getting really annoying. Is there a better way to
make multiple group matches accessible without adding a third
element type as a group element?

    The most elegant solution would be to have a regular expression
function that returned a tree of tuples or lists.  Then you could
express an entire language syntax as a regular expression and
get out a parse tree.

    Since the regular expression system is actually doing that work,
then discarding the results, it seems a reasonable extension.
I'm not suggesting extending regular expression matching itself,
just the way the results are stored.

                                John Nagle

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to