venkates wrote:
Hi all,
Hello,
I am trying to filter files from a directory (code provided below) by
comparing the contents of each file with a hash ref (a parsed id map
file provided as an argument). The code is working however, is extremely
slow. The .csv files (81 files) that I am reading are not very large
(largest file is 183,258 bytes). I would appreciate if you could suggest
improvements to the code.
sub filter {
my ( $pazar_dir_path, $up_map, $output ) = @_;
croak "Not enough arguments! " if ( @_ < 3 );
my $accepted = 0;
my $rejected = 0;
opendir DH, $pazar_dir_path or croak ("Error in opening directory
'$pazar_dir_path': $!");
open my $OUT, '>', $output or croak ("Cannot open file for writing
'$output': $!");
while ( my @data_files = grep(/\.csv$/,readdir(DH)) ) {
my @records;
foreach my $file ( @data_files ) {
open my $FH, '<', "$pazar_dir_path/$file" or croak ("Cannot open
file
'$file': $!");
while ( my $data = <$FH> ) {
chomp $data;
my $record_output;
@records = split /\t/, $data;
foreach my $up_acs ( keys %{$up_map} ) {
foreach my $ensemble_id (
@{$up_map->{$up_acs}{'Ensembl_TRS'}} ){
if ( $records[1] eq $ensemble_id ) {
$record_output = join( "\t", @records );
print $OUT "$record_output\n";
$accepted++;
}
else {
$rejected++;
next;
}
}
}
}
close $FH;
}
}
close $OUT;
closedir (DH);
print "accepted records: $accepted\n, rejected records: $rejected\n";
return $output;
}
$output doesn't change inside the sub so why are you returning it?
I couldn't see any way to improve the basic algorithm but I did remove
some unnecessary code and shortened some stuff:
sub filter {
croak "Not enough arguments! " if @_ < 3;
my ( $pazar_dir_path, $up_map, $output ) = @_;
my $accepted = 0;
my $rejected = 0;
opendir my $DH, $pazar_dir_path or croak "Error in opening
directory '$pazar_dir_path': $!";
open my $OUT, '>', $output or croak "Cannot open file for writing
'$output': $!";
foreach my $file ( grep /\.csv$/, readdir $DH ) {
open my $FH, '<', "$pazar_dir_path/$file" or croak "Cannot
open file '$file': $!";
while ( my $data = <$FH> ) {
my $key = ( split /\t/, $data )[ 1 ];
foreach my $up_acs ( values %$up_map ) {
foreach my $ensemble_id ( @{ $up_acs->{ Ensembl_TRS }
} ) {
if ( $key eq $ensemble_id ) {
print $OUT $data;
$accepted++;
}
else {
$rejected++;
}
}
}
}
}
print "accepted records: $accepted\n, rejected records: $rejected\n";
}
HTH.
John
--
Any intelligent fool can make things bigger and
more complex... It takes a touch of genius -
and a lot of courage to move in the opposite
direction. -- Albert Einstein
--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/