S.Selvam wrote:
Hi all,
1) I need to remove the <a> tags which is just before the keyword(i.e
some_text2 ) excluding others.
2) input string may or may not contain <a> tags.
3) Sample input:
inputstr = """start <a href="some_url">some_text1</a> <a
href="">some_text2</a> keyword anything"""
4) I came up with the following regex,
p=re.compile(r'(?P<good1>.*?)(\s*<a.*?</a>keyword|\s*keyword)(?P<good2>.*)',re.DOTALL|re.I)
s=p.search(inputstr)
but second group matches both <a> tags,while i need to match the
recent one only.
I would like to get your suggestions.
Note:
If i leave group('good1') as greedy, then it matches both the <a> tag.
".*?" can match any number of any character, so it can match any
intervening "<a>" tags. Try "[^<]*?" instead.
--
http://mail.python.org/mailman/listinfo/python-list