On Sep 6, 10:06 pm, "Mark Tolonen" <metolone+gm...@gmail.com> wrote: > <gburde...@gmail.com> wrote in message > > news:f98a6057-c35f-4843-9efb-7f36b05b6...@g19g2000yqo.googlegroups.com... > > > If I do this: > > > import re > > a=re.search(r'hello.*?money', 'hello how are you hello funny money') > > > I would expect a.group(0) to be "hello funny money", since .*? is a > > non-greedy match. But instead, I get the whole sentence, "hello how > > are you hello funny money". > > > Is this expected behavior? How can I specify the correct regexp so > > that I get "hello funny money" ? > > A non-greedy match matches the fewest characters before matching the text > *after* the non-greedy match. For example: > > >>> import re > >>> a=re.search(r'hello.*?money','hello how are you hello funny money and > >>> more money') > >>> a.group(0) # non-greedy stops at the first money > > 'hello how are you hello funny money'>>> a=re.search(r'hello.*money','hello > how are you hello funny money and > >>> more money') > >>> a.group(0) # greedy keeps going to the last money > > 'hello how are you hello funny money and more money' > > This is why it is difficult to use regular expressions to match nested > objects like parentheses or XML tags. In your case you'll need something > extra to not match the first hello. > > >>> a=re.search(r'(?<!^)hello.*?money','hello how are you hello funny > >>> money') > >>> a.group(0) > > 'hello funny money' > > -Mark
I see now. I also understand r's response. But what if there are many "hello"'s before "money," and I don't know how many there are? In other words, I want to find every occurrence of "money," and for each occurrence, I want to scan back to the first occurrence of "hello." How can this be done? -- http://mail.python.org/mailman/listinfo/python-list