The problem I'm solving is to take a sequence like 'ATSGS' and make all the DNA sequences it represents. The A, T, and G are fine but the S represents C or G. I want to take this input:

[ [ 'A' ] , [ 'T' ] , [ 'C' , 'G' ], [ 'G' ] , [ 'C' , 'G' ] ]

and make the list:

[ 'ATCGC' , 'ATCGG' , 'ATGGC' , 'ATGGG' ]

The code below is what I have so far: 'alphabet' is a dictionary that designates the set oif base pairs that each letter represents (for example for S above it gives C and G). I call these ambiguous base pairs because they could be more then one. Thus the function name 'unambiguate'. It makes a list of sequences with only A T C and Gs and none of the ambiguous base pair designations.

The function 'unambiguate_bp' takes a sequence and a base pair in it and returns a set of sequences with that base pair replaced by each of it's unambiguous possibilities.

The function unambiguate_seq takes a sequence and runs unambiguate_bp on each base pair in the sequence. Each time it does a base pair it replaces the set of things it's working on with the output from the unambiguate_bp. It's a bit confusing. I'd like it to be clearer.

Is there a better way to do this?
--
David Siedband
generation-xml.com



def unambiguate_bp(seq, bp):
    seq_set = []
    for i in alphabet[seq[bp]]:
        seq_set.append(seq[:bp]+i+seq[bp+1:])
    return seq_set

def unambiguate_seq(seq):
        result = [seq]
        for i in range(len(seq)):
            result_tmp=[]
            for j in result:
                result_tmp = result_tmp + unambiguate_bp(j,i)
           result = result_tmp
    return result



alphabet = {
        'A' : ['A'],
        'T' : ['T'],
        'C' : ['C'],
        'G' : ['G'],
        'W' : ['A','T'],
        'M' : ['A','C'],
        'R' : ['A','G'],
        'Y' : ['T','C'],
        'K' : ['T','G'],
        'S' : ['C','G'],
        'H' : ['A','T','C'],
        'D' : ['A','T','G'],
        'V': ['A','G','C'],
        'B' : ['C','T','G'],
        'N' : ['A','T','C','G']
        }

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to