Yubin Ruan writes: > Hi everyone, > I am struggling writing a right regex that match what I want: > > Problem Description: > > Given a string like this: > > >>>string = "false_head <a>aaa</a> <a>bbb</a> false_tail \ > true_head some_text_here <a>ccc</a> <a>ddd</a> <a>eee</a> > true_tail" > > I want to match the all the text surrounded by those "<a> </a>", but > only if those "<a> </a>" locate **in some distance** behind > "true_head". That is, I expect to result to be like this: > > >>>import re > >>>result = re.findall("the_regex",string) > >>>print result > ["ccc","ddd","eee"] > > How can I write a regex to match that? > I have try to use the **positive lookbehind assertion** in python regex, > but it does not allowed variable length of lookbehind.
Don't. Don't even try to do it all in one regex. Keep your regexen simple and match in two steps. For example, capture all such elements together with your marker: re.findall(r'true_head|<a>[^<]+</a>', string) ==> ['<a>aaa</a>', '<a>bbb</a>', 'true_head', '<a>ccc</a>', '<a>ddd</a>', '<a>eee</a>'] Then filter the result in the obvious way (not involving any regex any more, unless needed to recognize the true 'true_head' again). I've kept the tags at this stage, so a possible '<a>true_head</a>' won't look like 'true_head' yet. Another way is to find 'true_head' first (if you can recognize it safely before also recognizing the elements), and then capture the elements in the latter half only. -- https://mail.python.org/mailman/listinfo/python-list