As I understand Biology, there is 4 nucleotid acids which gives 4**2 combinaions for dupplets. So you need 8 vars to count the occourence of all douplets. Worse for triplets. (24)
As I understand genetics, triplets are what matters, since the rma transcriptase reads triplets as code of amino acids. You might give my updates un my biol. knowledge:-)
Wolf -
It's been a while since my A-Level biology days, but I believe you're correct. However, this particular coursework was to create two programs for a different purpose than I think you're imagining:
transition.pl: returns tables of transition probabilities for plus and minus models (exon and non-exon regions) as well as beta values (log-odds ratios) to compare the two models.
The transition probability for AT for example (the probability that adenine will be followed by thymine) is calculated thus:
tp(AT) = |AT| / |A_|
The total number of occurrences of "AT" divided by the total number of "A" followed by anything.
The program can also write the transition probabilities to a file to be used as input for the other program...
simulation.pl: which asks the user to specify the length of the sequence they want, then generates it according to the model file used as input (by simulating a Markov chain). So if you supply a file containing the transition probabilities of a typical exon (coding) region, the simulation will use them to generate a typical exon sequence.
Thanks very much to everyone who's offered further advice on this problem, I know now that my method of counting the dinucleotides in the input sequence is a little brain-dead. However, it works, and I've learnt from it. I'm looking forward to my next foray into the world of Perl.
Regards,
Henry.
-- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>