Gabriel Murray wrote: > Hello, I'm looking for a regular expression which will match strings as > follows: if there are symbols a, b, c and d, then any pattern is valid if it > begins with a and ends with d and proceeds in order through the symbols. > However, at any point the pattern may reset to an earlier position in the > sequence and begin again from there. > For example, these would be valid patterns: > aabbbaabbcccbbbcccddd > aabcabcd > abcd > > But these would not: > aaaaabbbbbccccaaaaadddd (goes straight from a to d) > aaaaaaaaaaabbbbbccc (does not reach d) > > Can anyone think of a concise way of writing this regex? The ones I can > think of are very long and awkward. > Gabriel > >
It's a bit ugly, but import re tests = [ ('aabbbaabbcccbbbcccddd', True), ('aabcabcd', True), ('abcd', True), ('aaaaabbbbbccccaaaaadddd', False), ('aaaaaaaaaaabbbbbccc', False), ] regex = r'^(a+b+)+(c+(a*b+)*)+d+$' r = re.compile(regex) for test, expected in tests: matched = (r.match(test) is not None) if matched == expected: print "PASSED: %s with %s" % (test, expected) else: print "FAILED: %s with %s" % (test, expected) passes all the tests you suggested. One test that stands out to me as an undefined case would be abcdcd (where, after reaching D, the pattern backtracks again). It currently assumes nothing but "d"s follow "d"s. -tkc -- http://mail.python.org/mailman/listinfo/python-list