Wijaya Edward wrote: > Dear Experts, > > I am looking for a really efficient way to compute a position weight matrix > (PWM) from a set of strings. In each set the strings are of the same length. > Basically PWM compute the frequency (or probabilities) of bases [ATCG] occur > in each position/column of a string. For example the set of strings below: > > AAA > ATG > TTT > GTC > > Note that the length of these strings in the set > maybe greater than 3. > > Would give the following result: > > $VAR1 = { > 'A' => [2,1,1], > 'T' => [1,3,1], > 'C' => [0,0,1], > 'G' => [1,0,1] > }; > > So the size of the array is the same with the length of the string. > In my case I need the variation of it, namely the probability of the > each base occur in the particular position: > > $VAR = { > 'A' => ['0.5','0.25','0.25'], > 'T' => ['0.25','0.75','0.25'], > 'C' => ['0','0','0.25'], > 'G' => ['0.25','0','0.25'] > } > > In this link you can find my incredibly naive and inefficient code. > Can any body suggest a better and faster solution than this: > > http://www.rafb.net/paste/results/c6T7B629.html
Hi Edward. A nice little problem. Thank you. The main reason for the length of your own solution is that you haven't taken the opportunity to use hashes to store data that is parallel across the four possible characters, so the code is about four times as long as it needs to be! Here is my solution. I have written it to pull data from the pseudo-filehandle DATA, as it is unlikely that you will want your actual data hard-coded as an array. HTH. Rob Dixon use strict; use warnings; my %pwm; while (<DATA>) { my $col = 0; foreach my $c (/\S/g) { $pwm{$c}[$col++]++; } } foreach my $freq (values %pwm) { $_ = $_ ? $_ / keys %pwm : 0 foreach @$freq; } use Data::Dumper; print Dumper \%pwm; __END__ AAA ATG TTT GTC OUTPUT $VAR1 = { 'A' => [ '0.5', '0.25', '0.25' ], 'T' => [ '0.25', '0.75', '0.25' ], 'C' => [ 0, 0, '0.25' ], 'G' => [ '0.25', 0, '0.25' ] }; -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>