On 09/06/2006 04:02 AM, Wijaya Edward wrote:
Dear Experts,
I am looking for a really efficient way to compute a position weight matrix (PWM) from a set of strings. In each set the strings are of the same length. Basically PWM compute the frequency (or probabilities) of bases [ATCG] occur in each position/column of a string. For example the set of strings below: AAA
                    ATG
                    TTT
                    GTC

Note that the length of these strings in the set maybe greater than 3. Would give the following result: $VAR1 = {
            'A' => [2,1,1],
            'T' => [1,3,1],
            'C' => [0,0,1],
            'G' => [1,0,1]
         };
So the size of the array is the same with the length of the string. In my case I need the variation of it, namely the probability of the each base occur in the particular position:

$VAR =     {
            'A' => ['0.5','0.25','0.25'],
            'T' => ['0.25','0.75','0.25'],
            'C' => ['0','0','0.25'],
            'G' => ['0.25','0','0.25']
          }
In this link you can find my incredibly naive and inefficient code. Can any body suggest a better and faster solution than this: http://www.rafb.net/paste/results/c6T7B629.html Thanks and Regards,
Edward WIJAYA
SINGAPORE


Although I'm sure that smarter posters than I will turn this into a one-liner, I think that my solution is not so atrocious:

use strict;
use warnings;
use Data::Dumper;
local our @deep;
local $; = ','; # A vestige of a previous version

my @data = qw(AAA ATG TTT GTC);
my @d2 = map [ split // ], @data;

my (%hash);
for my $entry (@d2) {
    *deep = $entry;
    for my $nx (0..$#deep) {
        $hash{$deep[$nx]}[$nx]++;
    }
}
foreach my $entry (values %hash) {
    $entry = [ map defined $_ ? $_ : 0, @$entry ];
}
print Dumper(\%hash);

__HTH__


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to