Re: Something confusing about non-greedy reg exp match

George Burdell Sun, 06 Sep 2009 20:26:34 -0700

On Sep 6, 10:06 pm, "Mark Tolonen" <[email protected]> wrote:
> <[email protected]> wrote in message
>
> news:f98a6057-c35f-4843-9efb-7f36b05b6...@g19g2000yqo.googlegroups.com...
>
> > If I do this:
>
> > import re
> > a=re.search(r'hello.*?money',  'hello how are you hello funny money')
>
> > I would expect a.group(0) to be "hello funny money", since .*? is a
> > non-greedy match. But instead, I get the whole sentence, "hello how
> > are you hello funny money".
>
> > Is this expected behavior? How can I specify the correct regexp so
> > that I get "hello funny money" ?
>
> A non-greedy match matches the fewest characters before matching the text
> *after* the non-greedy match.  For example:
>
> >>> import re
> >>> a=re.search(r'hello.*?money','hello how are you hello funny money and
> >>> more money')
> >>> a.group(0)  # non-greedy stops at the first money
>
> 'hello how are you hello funny money'>>> a=re.search(r'hello.*money','hello 
> how are you hello funny money and
> >>> more money')
> >>> a.group(0)  # greedy keeps going to the last money
>
> 'hello how are you hello funny money and more money'
>
> This is why it is difficult to use regular expressions to match nested
> objects like parentheses or XML tags.  In your case you'll need something
> extra to not match the first hello.
>
> >>> a=re.search(r'(?<!^)hello.*?money','hello how are you hello funny
> >>> money')
> >>> a.group(0)
>
> 'hello funny money'
>
> -Mark


I see now. I also understand r's response. But what if there are many
"hello"'s before "money," and I don't know how many there are? In
other words, I want to find every occurrence of "money," and for each
occurrence, I want to scan back to the first occurrence of "hello."
How can this be done?
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Something confusing about non-greedy reg exp match

Reply via email to