venkates wrote:
Hi all,

Hello,

I am trying to filter files from a directory (code provided below) by
comparing the contents of each file with a hash ref (a parsed id map
file provided as an argument). The code is working however, is extremely
slow. The .csv files (81 files) that I am reading are not very large
(largest file is 183,258 bytes). I would appreciate if you could suggest
improvements to the code.

sub filter {
    my ( $pazar_dir_path, $up_map, $output ) = @_;
    croak "Not enough arguments! " if ( @_ < 3 );

    my $accepted = 0;
    my $rejected = 0;

    opendir DH, $pazar_dir_path or croak ("Error in opening directory
'$pazar_dir_path': $!");
    open my $OUT, '>', $output or croak ("Cannot open file for writing
'$output': $!");
    while ( my @data_files = grep(/\.csv$/,readdir(DH)) ) {
        my @records;
        foreach my $file ( @data_files ) {
            open my $FH, '<', "$pazar_dir_path/$file" or croak ("Cannot open 
file
'$file': $!");
            while ( my $data = <$FH> ) {
                chomp $data;
                my $record_output;
                @records = split /\t/, $data;
                foreach my $up_acs ( keys %{$up_map} ) {
                    foreach my $ensemble_id ( 
@{$up_map->{$up_acs}{'Ensembl_TRS'}} ){
                        if ( $records[1] eq $ensemble_id ) {
                            $record_output = join( "\t", @records );
                            print $OUT "$record_output\n";
                            $accepted++;
                        }
                        else {
                            $rejected++;
                            next;
                        }
                    }
                }
            }
            close $FH;
        }
    }
    close $OUT;
    closedir (DH);
    print "accepted records: $accepted\n, rejected records: $rejected\n";
    return $output;
}

$output doesn't change inside the sub so why are you returning it?

I couldn't see any way to improve the basic algorithm but I did remove some unnecessary code and shortened some stuff:

sub filter {
     croak "Not enough arguments! " if @_ < 3;
     my ( $pazar_dir_path, $up_map, $output ) = @_;

     my $accepted = 0;
     my $rejected = 0;

opendir my $DH, $pazar_dir_path or croak "Error in opening directory '$pazar_dir_path': $!"; open my $OUT, '>', $output or croak "Cannot open file for writing '$output': $!";

     foreach my $file ( grep /\.csv$/, readdir $DH ) {
open my $FH, '<', "$pazar_dir_path/$file" or croak "Cannot open file '$file': $!";
         while ( my $data = <$FH> ) {
             my $key = ( split /\t/, $data )[ 1 ];
             foreach my $up_acs ( values %$up_map ) {
foreach my $ensemble_id ( @{ $up_acs->{ Ensembl_TRS } } ) {
                     if ( $key eq $ensemble_id ) {
                         print $OUT $data;
                         $accepted++;
                     }
                     else {
                         $rejected++;
                     }
                 }
             }
         }
     }
     print "accepted records: $accepted\n, rejected records: $rejected\n";
}



HTH.

John
--
Any intelligent fool can make things bigger and
more complex... It takes a touch of genius -
and a lot of courage to move in the opposite
direction.                   -- Albert Einstein

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to