On Fri, Nov 13, 2009 at 12:47 AM, MRAB <pyt...@mrabarnett.plus.com> wrote:
> S.Selvam wrote: > >> Hi all, >> >> >> 1) I need to remove the <a> tags which is just before the keyword(i.e >> some_text2 ) excluding others. >> >> 2) input string may or may not contain <a> tags. >> >> 3) Sample input: inputstr = """start <a >> href="some_url">some_text1</a> <a href="">some_text2</a> keyword anything""" >> >> 4) I came up with the following regex, >> >> >> p=re.compile(r'(?P<good1>.*?)(\s*<a.*?</a>keyword|\s*keyword)(?P<good2>.*)',re.DOTALL|re.I) >> s=p.search(inputstr) >> but second group matches both <a> tags,while i need to match the recent >> one only. >> >> I would like to get your suggestions. >> >> Note: >> >> If i leave group('good1') as greedy, then it matches both the <a> tag. >> >> ".*?" can match any number of any character, so it can match any > intervening "<a>" tags. Try "[^<]*?" instead. > > Thanks a lot, p=re.compile(r'(?:<a[^<]*?<\/a>\s*%s)'%(keyword),re.I|re.S) has done it ! -- > http://mail.python.org/mailman/listinfo/python-list > -- Yours, S.Selvam
-- http://mail.python.org/mailman/listinfo/python-list