Luba Pardo wrote:
Dear list:
Hello,
I wrote a script that takes a list of ids from an input file and store these
in an array in a pairwise-like manner (if total list is n then the array is
(2 ^n)-n). I need to extract for each pair of ids a certain value from a
huge file that contains the pair of ids and the value (format of the
file: col1 col2 id1 id2 value).
The script works but it is takes too long, specially because the second file
is too big (more than 600 MB).
I am assuming that "top_22.txt" is the second file?
I would like to increase the speed of the script, but I haven't quite worked
what is the best way to do it.
Any tip?
The biggest tip from glancing at your code is instead of doing:
open FH, 'file' or die $!;
my @array = <FH>;
for (my $m=0; $m<=$#array; $m++) {
Do:
open FH, 'file' or die $!;
while ( <FH> ) {
Thanks in advance,
L. Pardo
ps, I am attaching the script
[ snipped code ]
my @temp2 = split/\s+/,$a3[$x];
if($temp1[0] eq $temp2[3] & $temp1[1] eq $temp2[4] || $temp1[0] eq
$temp2[4] & $temp1[1] eq $temp2[3]) {
Did you really want to use the bit-wise '&' operator and not the logical '&&'
operator?
print "$temp2[3], $temp2[4],$temp2[5],$temp2[6]\n";
Shouldn't that be:
print OUT2 "$temp2[3], $temp2[4],$temp2[5],$temp2[6]\n";
instead? Otherwise you are not using the OUT2 filehandle anywhere.
} elsif($x == $#a3) {
This should be more efficient:
#!/usr/bin/perl
use strict;
use warnings;
my @a4;
{ open my $SN, '<', 'file_22.txt' or die "cannot open 'file_22.txt' $!\n";
my %nuc;
my @arr;
while ( <$SN> ) {
my ( $val, $key ) = ( split )[ 1, 3 ];
$nuc{ $key } = $val;
push @arr, $key;
}
close $SN;
print scalar @arr, "\n";
for my $i ( 0 .. 50 ) {
for my $n ( $i .. 50 ) {
if ( $nuc{ $arr[ $i ] } >= $nuc{ $arr[ $n ] } - 200_000 ) {
push @a4, [ @arr[ $i, $n ] ];
}
}
}
}
for my $temp ( @a4 ) {
print "OK $temp->[0],$temp->[1]\n";
}
######SECOND_PART#############
open my $LL, '<', 'top_22.txt' or die "cannot open 'top_22.txt' $!\n";
open my $OUT1, '>', 'Not_found.txt' or die "cannot open 'Not_found.txt' $!\n";
open my $OUT2, '>', 'Pairwise.txt' or die "cannot open 'Pairwise.txt' $!\n";
# I have reversed the loops so that 'top_22.txt' is only read once
while ( <$LL> ) {
my @temp2 = ( split )[ 3 .. 6 ];
for my $temp1 ( @a4 ) {
if ( $temp1->[ 0 ] eq $temp2[ 0 ] & $temp1->[ 1 ] eq $temp2[ 1 ]
|| $temp1->[ 0 ] eq $temp2[ 1 ] & $temp1->[ 1 ] eq $temp2[ 0 ] ) {
print "$temp2[0], $temp2[1],$temp2[2],$temp2[3]\n";
next;
}
print $OUT1 "$temp1->[0], $temp1->[1]\n;"
}
}
close $OUT2;
close $OUT1;
close $LL;
#0 == system 'gzip', 'Not_found.txt' or die "system 'gzip', 'Not_found.txt'
failed: $?";
John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/