On Sep 6, 10:22 pm, George Burdell <gburde...@gmail.com> wrote: > On Sep 6, 10:06 pm, "Mark Tolonen" <metolone+gm...@gmail.com> wrote: > > > > > > > <gburde...@gmail.com> wrote in message > > >news:f98a6057-c35f-4843-9efb-7f36b05b6...@g19g2000yqo.googlegroups.com... > > > > If I do this: > > > > import re > > > a=re.search(r'hello.*?money', 'hello how are you hello funny money') > > > > I would expect a.group(0) to be "hello funny money", since .*? is a > > > non-greedy match. But instead, I get the whole sentence, "hello how > > > are you hello funny money". > > > > Is this expected behavior? How can I specify the correct regexp so > > > that I get "hello funny money" ? > > > A non-greedy match matches the fewest characters before matching the text > > *after* the non-greedy match. For example: > > > >>> import re > > >>> a=re.search(r'hello.*?money','hello how are you hello funny money and > > >>> more money') > > >>> a.group(0) # non-greedy stops at the first money > > > 'hello how are you hello funny money'>>> a=re.search(r'hello.*money','hello > > how are you hello funny money and > > >>> more money') > > >>> a.group(0) # greedy keeps going to the last money > > > 'hello how are you hello funny money and more money' > > > This is why it is difficult to use regular expressions to match nested > > objects like parentheses or XML tags. In your case you'll need something > > extra to not match the first hello. > > > >>> a=re.search(r'(?<!^)hello.*?money','hello how are you hello funny > > >>> money') > > >>> a.group(0) > > > 'hello funny money' > > > -Mark > > I see now. I also understand r's response. But what if there are many > "hello"'s before "money," and I don't know how many there are? In > other words, I want to find every occurrence of "money," and for each > occurrence, I want to scan back to the first occurrence of "hello." > How can this be done?
I should say "closet" occurrence of "hello," to be more clear. -- http://mail.python.org/mailman/listinfo/python-list