Luba Pardo wrote:
Dear list:

Hello,

I wrote a script that takes a list of ids from an input file and store these
in an array in a pairwise-like manner (if total list is n then the array is
(2 ^n)-n). I need to extract for each pair of ids a certain value from a
huge file that contains the pair of ids and the value (format of the
file: col1 col2  id1 id2  value).
The script works but it is takes too long, specially because the second file
is too big (more than 600 MB).

I am assuming that "top_22.txt" is the second file?

I would like to increase the speed of the script, but I haven't quite worked
what is the best way to do it.
Any tip?

The biggest tip from glancing at your code is instead of doing:

open FH, 'file' or die $!;
my @array = <FH>;
for (my $m=0; $m<=$#array; $m++) {

Do:

open FH, 'file' or die $!;
while ( <FH> ) {


Thanks in advance,
L. Pardo
ps, I am attaching the script

[ snipped code ]

          my @temp2 = split/\s+/,$a3[$x];
               if($temp1[0] eq $temp2[3] & $temp1[1] eq $temp2[4] || $temp1[0] eq 
$temp2[4] & $temp1[1] eq $temp2[3]) {

Did you really want to use the bit-wise '&' operator and not the logical '&&' operator?

         print "$temp2[3], $temp2[4],$temp2[5],$temp2[6]\n";

Shouldn't that be:

          print OUT2 "$temp2[3], $temp2[4],$temp2[5],$temp2[6]\n";

instead?  Otherwise you are not using the OUT2 filehandle anywhere.

           }   elsif($x == $#a3) {




This should be more efficient:

#!/usr/bin/perl
use strict;
use warnings;

my @a4;

{   open my $SN, '<', 'file_22.txt' or die "cannot open 'file_22.txt' $!\n";

    my %nuc;
    my @arr;
    while ( <$SN> ) {
        my ( $val, $key ) = ( split )[ 1, 3 ];
        $nuc{ $key } = $val;
        push @arr, $key;
        }
    close $SN;
    print scalar @arr, "\n";

    for my $i ( 0 .. 50 ) {
        for my $n ( $i .. 50 ) {
            if ( $nuc{ $arr[ $i ] } >= $nuc{ $arr[ $n ] } - 200_000 ) {
                push @a4, [ @arr[ $i, $n ] ];
                }
            }
        }
    }

for my $temp ( @a4 ) {
    print "OK $temp->[0],$temp->[1]\n";
    }


######SECOND_PART#############

open my $LL, '<', 'top_22.txt' or die "cannot open 'top_22.txt' $!\n";

open my $OUT1, '>', 'Not_found.txt' or die "cannot open 'Not_found.txt' $!\n";
open my $OUT2, '>', 'Pairwise.txt'  or die "cannot open 'Pairwise.txt' $!\n";


# I have reversed the loops so that 'top_22.txt' is only read once
while ( <$LL> ) {
    my @temp2 = ( split )[ 3 .. 6 ];
    for my $temp1 ( @a4 ) {
        if (   $temp1->[ 0 ] eq $temp2[ 0 ] & $temp1->[ 1 ] eq $temp2[ 1 ]
            || $temp1->[ 0 ] eq $temp2[ 1 ] & $temp1->[ 1 ] eq $temp2[ 0 ] ) {

            print "$temp2[0], $temp2[1],$temp2[2],$temp2[3]\n";
            next;
            }
        print $OUT1 "$temp1->[0], $temp1->[1]\n;"
        }
    }

close $OUT2;
close $OUT1;

close $LL;

#0 == system 'gzip', 'Not_found.txt' or die "system 'gzip', 'Not_found.txt' failed: $?";



John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order.                            -- Larry Wall

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/


Reply via email to