[EMAIL PROTECTED] wrote: > Your code will identify sequences in a list, but how to index them? I > have an idea, which seems ridiculously long-winded, but should work. > First, put the groups from the âmake_diffsâ function into a list > and do a search for the sequence identified from the > âcompare_formsâ function using the âKnuth-Morris-Prattâ > function in the python cookbook. Then the positions in that list will > obviously be the same as those in the original. > > Next is the problem of the interval between the individual groups of > the sequence. Using something along the lines of your âmake_diffsâ > it should be a simple matter to see by what interval the groups in the > sequence are transposed.
Hold on! Adding the index position and transposition offset is a small change to the existing code: # Good sequences s1 = ['a', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'e'] s2 = ['a', 'c', 'f', 'c', 'e', 'h', 'e', 'g', 'c'] s3 = ['c', 'a', 'a', 'f', 'f', 'd', 'd', 'b'] # Bad sequences b1 = ['a', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'e', 'f', 'a', 'd'] b2 = ['a', 'c', 'f', 'a', 'f', 'e', 'e', 'g', 'c'] def make_groups(sequence, n, anywhere = False): """yields tuples of sub-sequences of length n, and their position from sequence. bool: anywhere False: look only at positions n*i for i = 1,2,3... True: return every (overlapping sequence) of length n Usage: >>> list(make_groups(s1,3)) [(['a', 'b', 'c'], (0, 3)), (['b', 'c', 'd'], (3, 6)), (['c', 'd', 'e'], (6, 9))] """ length = len(sequence) if (length % n) and not anywhere: raise ValueError, "Sequence length %s is not multiple of %s"\ % (length, n) for i in xrange(0, length-n+1, anywhere or n): yield sequence[i:i+n], (i, i+n) def ord_note(note, note_base = "a"): """Return the ordinal value of a note, or the interval between two notes""" return (ord(note)-ord(note_base)) % 7 def make_diffs(groups): """return groups in canonical form where the first note in each sequence is 0, and each subsequent note is expressed as a positive interval from the first. Example: >>> list(make_diffs(make_groups(s1,3))) [((0, 1, 2), (0, 3)), ((0, 1, 2), (3, 6)), ((0, 1, 2), (6, 9))] """ for group, index in groups: seq, base_note = [0], group[0] for note in group[1:]: interval = ord_note(note, base_note) seq.append(interval) yield tuple(seq), index, base_note def group_forms(sequence, anywhere = False, max_length = 4): """group sequences of similar form yields {form_tuple: [index tuples where form occurs]} starting with longer forms """ for length in range(max_length, 1, -1): forms = {} try: for group, index, base_note in make_diffs(make_groups(sequence, length, anywhere)): forms.setdefault(group,[]).append((index,base_note)) yield forms except ValueError: # raised if the sequence is not a multiple of length pass def search(sequence, max_length = 4, min_repeats = 3): """print all sequences that occur least min_repeats times""" found = 0 print "Finding forms length 2-%s, occuring at minimum %s times" \ % (max_length, min_repeats) for form_dict in group_forms(sequence, anywhere = True): for form, data in form_dict.iteritems(): if len(data) >= min_repeats: print "%s:%s" % (form, ["%s:%s" % (index, ord_note(offset)) for index, offset in data]) found +=1 print "Found: %s" % found >>> search(s1) Finding forms length 2-4, occuring at minimum 3 times (0, 1, 2):['(0, 3):0', '(3, 6):1', '(6, 9):2'] (0, 1):['(0, 2):0', '(1, 3):1', '(3, 5):1', '(4, 6):2', '(6, 8):2', '(7, 9):3'] Found: 2 Given the increasingly complex tuples between the different stages it would probably be worth defining a class to represent a form_match object Michael -- http://mail.python.org/mailman/listinfo/python-list