Gabriel Murray <gabriel.murray <at> gmail.com> writes: > > Hello, I'm looking for a regular expression which will match strings as follows: if there are symbols a, b, c and d, then any pattern is valid if it begins with a and ends with d and proceeds in order through the symbols. However, at any point the pattern may reset to an earlier position in the sequence and begin again from there. > For example, these would be valid patterns:aabbbaabbcccbbbcccdddaabcabcdabcdBut these would not:aaaaabbbbbccccaaaaadddd (goes straight from a to d)aaaaaaaaaaabbbbbccc (does not reach d)Can anyone think of a concise way of writing this regex? The ones I can think of are very long and awkward.Gabriel >
Your cirteria could be defined more simply as the following: * must start with an 'a' and end with a 'd' * an 'a' must not be followed by 'c' or 'd' * a 'b' must not be followed by 'd' Therefore the regexp can more simply be written as: regexp = re.compile(r'''a ( a(?!c|d) | b(?!d) | c | d )* d''', re.VERBOSE) Test code: tests = [ ('abcd', True), ('aaaaaaaaaaabbbbbccc', False), ('aabbccaabbccabcdddcabababbbccccdddd', True), ('aabbccaabbccabcabababbbccccddddabcd', True), ('aaaaabbbbbccccaaaaadddd', False), ('aabbccaabbccacabababbbccccdddd', False), ('abccccdaaaabbbbccccd', True), ('abcdcd', True), ('aabbbaabbcccbbbcccddd', True), ('aabbccaabbccabcabababbbccccdddd', True), ('abccccdccccd', True), ('aabcabcd', True) ] def checkit(regexp, tests=tests): for test, expected in tests: matched = regexp.match(test) is not None if matched == expected: print "PASSED: %s with %s" % (test, expected) else: print "FAILED: %s with %s" % (test, expected) >>> checkit(regexp, tests) PASSED: abcd with True PASSED: aaaaaaaaaaabbbbbccc with False PASSED: aabbccaabbccabcdddcabababbbccccdddd with True PASSED: aabbccaabbccabcabababbbccccddddabcd with True PASSED: aaaaabbbbbccccaaaaadddd with False PASSED: aabbccaabbccacabababbbccccdddd with False PASSED: abccccdaaaabbbbccccd with True PASSED: abcdcd with True PASSED: aabbbaabbcccbbbcccddd with True PASSED: aabbccaabbccabcabababbbccccdddd with True PASSED: abccccdccccd with True PASSED: aabcabcd with True - Tal -- http://mail.python.org/mailman/listinfo/python-list