James Stroud wrote: > nuttydevil wrote: > >> I have many notepad documents that all contain long chunks of genetic >> code. They look something like this: >> >> atggctaaactgaccaagcgcatgcgtgttatccgcgagaaagttgatgcaaccaaacag >> tacgacatcaacgaagctatcgcactgctgaaagagctggcgactgctaaattcgtagaa >> agcgtggacgtagctgttaacctcggcatcgacgctcgtaaatctgaccagaacgtacgt >> ggtgcaactgtactgccgcacggtactggccgttccgttcgcgtagccgtatttacccaa >> >> Basically, I want to design a program using python that can open and >> read these documents. However, I want them to be read 3 base pairs at a >> time (to analyse them codon by codon) and find the value that each >> codon has a value assigned to it. An example of this is below: >> >> ** If the three base pairs were UUU the value assigned to it (from the >> codon value table) would be 0.296 >> >> The program has to read all the sequence three pairs at a time, then I >> want to get all the values for each codon, multiply them together and >> put them to the power of 1 / the length of the sequence in codons >> (which is the length of the whole sequence divided by three). >> >> However, to make things even more complicated, the notebook sequences >> are in lowercase and the codon value table is in uppercase, so the >> sequences need to be converted into uppercase. Also, the Ts in the DNA >> sequences need to be changed to Us (again to match the codon value >> table). And finally, before the DNA sequences are read and analysed I >> need to remove the first 50 codons (i.e. the first 150 letters) and the >> last 20 codons (the last 60 letters) from the DNA sequence. I've also >> been having problems ensuring the program reads ALL the sequence 3 >> letters at a time. >> >> I've tried various ways of doing this but keep coming unstuck along the >> way. Has anyone got any suggestions for how they would tackle this >> problem? > > > Yes: use python. > >> Thanks for any help recieved! >> > > I couldn't help myself. I strongly suggest you study this example. It > will cut your coding time way down in the future. > > I'm writing your name down and this is the last time I'm doing homework > for you. > > James > > > from operator import mul > > table = { 'AUG' : 0.98999, 'CCC' : 0.9755 } # <== you fill this in > trim_front = 50 > trim_back = 20 > > # Why I did this: > # Python >=1 line per thought; you have to love it > data = "".join([s.strip() for s in open(filename)]) > data = data.upper().replace('T', 'U') > codons = [data[i:i+3] for i in xrange(0, len(data), 3)] # Alex Martelli > trimmed = codons[trim_front:-trim_back] > product = reduce(mul, [table[codon] for codon in codons]) > value = product**(1.0/len(trimmed)) # <== is this really ALL codons? > > print value # useless print statement > >
I noticed a typo. Should be "Python <= 1 line per thought". James -- James Stroud UCLA-DOE Institute for Genomics and Proteomics Box 951570 Los Angeles, CA 90095 http://www.jamesstroud.com/ -- http://mail.python.org/mailman/listinfo/python-list