On Thu, 26 Jan 2006 16:26:57 GMT in comp.lang.python, "Roger L. Cauvin" <[EMAIL PROTECTED]> wrote:
>"Christos Georgiou" <[EMAIL PROTECTED]> wrote in message >news:[EMAIL PROTECTED] [...] >> Is this what you mean? >> >> ^[^a]*(a{3})(?:[^a].*)?$ > >Close, but the pattern should allow "arbitrary sequence of characters" that >precede the alternating a's and b's to contain the letter 'a'. In other >words, the pattern should accept: > >"xayz123aaabbab" > >since the 'a' between the 'x' and 'y' is not directly followed by a 'b'. > I don't know an RE is the best solution to this problem. If I understand the problem correctly, building a state machine to solve this is trivial. The following took about 5 minutes of coding: ---begin included file # Define our states. # state[0] is next state if character is 'a' # state[1] is next state if character is 'b' # state[2] is next state for any other character # Accept state means we've found a match Accept = [] for i in range(3): Accept.append(Accept) # Reject state means the string cannot match Reject = [] for i in range(3): Reject.append(Reject) # Remaining states: Start, 'a' found, 'aa', 'aaa', and 'aaaa' Start = [0,1,2] a1 = [0,1,2] a2 = [0,1,2] a3 = [0,1,2] a4 = [0,1,2] # Start: looking for first 'a' Start[0] = a1 Start[1] = Start Start[2] = Start # a1: 1 'a' found so far a1[0] = a2 a1[1] = Reject a1[2] = Start # a2: 'aa' found a2[0] = a3 a2[1] = Reject a2[2] = Start # a3: 'aaa' found a3[0] = a4 a3[1] = Accept a3[2] = Start # a4: four or more 'a' in a row a4[0] = a4 a4[1] = Reject a4[2] = Start def detect(s): """ Return 1 if first substring aa*b has exactly 3 a's Return 0 otherwise """ state = Start for c in s: if c == 'a': state = state[0] elif c == 'b': state = state[1] else: state = state[2] if state is Accept: return 1 return 0 print detect("xyza123abc") print detect("xyzaaa123aabc") print detect("xyzaa123aaabc") print detect("xyza123aaaabc") --- end included file --- And I'm pretty sure it does what you need, though it's pretty naive. Note that if '3' isn't a magic number, states a1, a2, a3, and a4 could be re-implemented as a single state with a counter, but the logic inside detect gets a little hairier. I haven't timed it, but it's not doing anything other than simple comparisons and assignments. It's a little (OK, a lot) more code than a simple RE, but I know it works. HTH, -=Dave -- Change is inevitable, progress is not. -- http://mail.python.org/mailman/listinfo/python-list