alister writes: > On Tue, 14 Jun 2016 20:28:24 -0700, Yubin Ruan wrote: > >> Hi everyone, >> I am struggling writing a right regex that match what I want: >> >> Problem Description: >> >> Given a string like this: >> >> >>>string = "false_head <a>aaa</a> <a>bbb</a> false_tail \ >> true_head some_text_here <a>ccc</a> <a>ddd</a> <a>eee</a> >> true_tail" >> >> I want to match the all the text surrounded by those "<a> </a>", >> but only if those "<a> </a>" locate **in some distance** behind >> "true_head". That is, I expect to result to be like this: >> >> >>>import re result = re.findall("the_regex",string) >> >>>print result >> ["ccc","ddd","eee"] >> >> How can I write a regex to match that? >> I have try to use the **positive lookbehind assertion** in python regex, >> but it does not allowed variable length of lookbehind. >> >> Thanks in advance, >> Ruan > > don't try to use regex to parse html it wont work reliably > i am surprised no one has mentioned beautifulsoup yet, which is probably > what you require.
Nothing in the question indicates that the data is HTML. -- https://mail.python.org/mailman/listinfo/python-list