Wijaya Edward wrote:

> Dear Experts,
>
> I am looking for a really efficient way to compute a position weight matrix
> (PWM) from a set of strings. In each set the strings are of the same length.
> Basically PWM compute the frequency (or probabilities) of bases [ATCG] occur
> in each position/column of a string. For example the set of strings below:
>
>                     AAA
>                     ATG
>                     TTT
>                     GTC
>
> Note that the length of these strings in the set
> maybe greater than 3.
>
> Would give the following result:
>
> $VAR1 =  {
>             'A' => [2,1,1],
>             'T' => [1,3,1],
>             'C' => [0,0,1],
>             'G' => [1,0,1]
>          };
>
> So the size of the array is the same with the length of the string.
> In my case I need the variation of it, namely the probability of the
> each base occur in the particular position:
>
> $VAR =     {
>             'A' => ['0.5','0.25','0.25'],
>             'T' => ['0.25','0.75','0.25'],
>             'C' => ['0','0','0.25'],
>             'G' => ['0.25','0','0.25']
>           }
>
> In this link you can  find my incredibly naive and inefficient code.
> Can any body suggest a better and faster solution than this:
>
> http://www.rafb.net/paste/results/c6T7B629.html

Hi Edward.

A nice little problem. Thank you.

The main reason for the length of your own solution is that you haven't taken
the opportunity to use hashes to store data that is parallel across the four
possible characters, so the code is about four times as long as it needs to be!

Here is my solution. I have written it to pull data from the pseudo-filehandle
DATA, as it is unlikely that you will want your actual data hard-coded as an
array.

HTH.

Rob Dixon


use strict;
use warnings;

my %pwm;

while (<DATA>) {
  my $col = 0;
  foreach my $c (/\S/g) {
    $pwm{$c}[$col++]++;
  }
}

foreach my $freq (values %pwm) {
  $_ = $_ ? $_ / keys %pwm : 0 foreach @$freq;
}

use Data::Dumper;
print Dumper \%pwm;


__END__
AAA
ATG
TTT
GTC


OUTPUT


$VAR1 = {
          'A' => [
                   '0.5',
                   '0.25',
                   '0.25'
                 ],
          'T' => [
                   '0.25',
                   '0.75',
                   '0.25'
                 ],
          'C' => [
                   0,
                   0,
                   '0.25'
                 ],
          'G' => [
                   '0.25',
                   0,
                   '0.25'
                 ]
        };

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to