This might be more flexible: pat = re.compile(r"^(a*(?=b)b*(?=[ac])c*(?=[abd])d*)+$") tests = [('aabbbaabbcccbbbcccddd', True), ('aabcabcd', True), ('abcd', True), ('aabbccaabbccabcabababbbccccdddd', True), ('aabbccaabbccabcabababbbccccddddabcd', True), ('aaaaabbbbbccccaaaaadddd', False), ('aaaaaaaaaaabbbbbccc', False), ('aabbccaabbccacabababbbccccdddd', False), ('aabbccaabbccabcdddcabababbbccccdddd', False)]
It works with all of the test cases you've given, and will also work in the case where 'd' is followed by '[abc]'. Tim Chase wrote: > Gabriel Murray wrote: > > Hello, I'm looking for a regular expression which will match strings as > > follows: if there are symbols a, b, c and d, then any pattern is valid if it > > begins with a and ends with d and proceeds in order through the symbols. > > However, at any point the pattern may reset to an earlier position in the > > sequence and begin again from there. > > For example, these would be valid patterns: > > aabbbaabbcccbbbcccddd > > aabcabcd > > abcd > > > > But these would not: > > aaaaabbbbbccccaaaaadddd (goes straight from a to d) > > aaaaaaaaaaabbbbbccc (does not reach d) > > > > Can anyone think of a concise way of writing this regex? The ones I can > > think of are very long and awkward. > > Gabriel > > > > > > It's a bit ugly, but > > import re > > tests = [ > ('aabbbaabbcccbbbcccddd', True), > ('aabcabcd', True), > ('abcd', True), > ('aaaaabbbbbccccaaaaadddd', False), > ('aaaaaaaaaaabbbbbccc', False), > ] > > regex = r'^(a+b+)+(c+(a*b+)*)+d+$' > r = re.compile(regex) > for test, expected in tests: > matched = (r.match(test) is not None) > if matched == expected: > print "PASSED: %s with %s" % (test, expected) > else: > print "FAILED: %s with %s" % (test, expected) > > > passes all the tests you suggested. > > One test that stands out to me as an undefined case would be > > abcdcd > > (where, after reaching D, the pattern backtracks again). > > It currently assumes nothing but "d"s follow "d"s. > > -tkc -- http://mail.python.org/mailman/listinfo/python-list