On 09/06/2006 04:02 AM, Wijaya Edward wrote:
Dear Experts,
I am looking for a really efficient way to compute a position weight matrix (PWM) from a set of strings. In each set the strings are of the same length. Basically PWM compute the frequency (or probabilities) of bases [ATCG] occur in each position/column of a string. For example the set of strings below:
AAA
ATG
TTT
GTC
Note that the length of these strings in the set
maybe greater than 3.
Would give the following result:
$VAR1 = {
'A' => [2,1,1],
'T' => [1,3,1],
'C' => [0,0,1],
'G' => [1,0,1]
};
So the size of the array is the same with the length of the string.
In my case I need the variation of it, namely the probability of the
each base occur in the particular position:
$VAR = {
'A' => ['0.5','0.25','0.25'],
'T' => ['0.25','0.75','0.25'],
'C' => ['0','0','0.25'],
'G' => ['0.25','0','0.25']
}
In this link you can find my incredibly naive and inefficient code.
Can any body suggest a better and faster solution than this:
http://www.rafb.net/paste/results/c6T7B629.html
Thanks and Regards,
Edward WIJAYA
SINGAPORE
Although I'm sure that smarter posters than I will turn this
into a one-liner, I think that my solution is not so atrocious:
use strict;
use warnings;
use Data::Dumper;
local our @deep;
local $; = ','; # A vestige of a previous version
my @data = qw(AAA ATG TTT GTC);
my @d2 = map [ split // ], @data;
my (%hash);
for my $entry (@d2) {
*deep = $entry;
for my $nx (0..$#deep) {
$hash{$deep[$nx]}[$nx]++;
}
}
foreach my $entry (values %hash) {
$entry = [ map defined $_ ? $_ : 0, @$entry ];
}
print Dumper(\%hash);
__HTH__
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>