On Jun 11, 2012, at 7:31 AM, venkates wrote: > Hi all, > > I am trying to filter files from a directory (code provided below) by > comparing the contents of each file with a hash ref (a parsed id map file > provided as an argument). The code is working however, is extremely slow. > The .csv files (81 files) that I am reading are not very large (largest file > is 183,258 bytes). I would appreciate if you could suggest improvements to > the code. > > sub filter { > my ( $pazar_dir_path, $up_map, $output ) = @_; > croak "Not enough arguments! " if ( @_ < 3 ); > > my $accepted = 0; > my $rejected = 0; > > opendir DH, $pazar_dir_path or croak ("Error in opening directory > '$pazar_dir_path': $!"); > open my $OUT, '>', $output or croak ("Cannot open file for writing > '$output': $!"); > while ( my @data_files = grep(/\.csv$/,readdir(DH)) ) {
This loop will only be executed one time, so it need not be a loop at all. readdir in list context (as established by grep) will return a list of all entries in the directory the first time it i called. Therefore, the second time this loop is executed, readdir will return an empty list. > my @records; There doesn't seem to be any reason to declare @records here, as it is not used outside of its innermost scope. > foreach my $file ( @data_files ) { > open my $FH, '<', "$pazar_dir_path/$file" or croak ("Cannot open > file '$file': $!"); > while ( my $data = <$FH> ) { > chomp $data; > my $record_output; > @records = split /\t/, $data; You can declare @records here: my @records = split /\t, $data; > foreach my $up_acs ( keys %{$up_map} ) { You are having to iterate over the entire hash here because you are not looking for the primary hash key, but an entry in a nested hash. One way to speed up your program would be to create a temporary hash that contains the entries of your nested hash: my %ensembles; for my $up_acs ( keys %{$p_map} ) { $ensembles{$up_map->{$up_acs}{'Ensembl_TRS'}} = 1; } Do that once at the beginning of the routine (or in the calling routine and pass that hash instead of the full hash, if the routine is called more than once an the hash doesn't change). > foreach my $ensemble_id ( > @{$up_map->{$up_acs}{'Ensembl_TRS'}} ){ > if ( $records[1] eq $ensemble_id ) { Now, instead of iterating over the %{$up_map} hash, you can test directly for the entry: if( $ensembles{$records[1] ) { ... ) > $record_output = join( "\t", @records ); > print $OUT "$record_output\n"; > $accepted++; > } > else { > $rejected++; > next; > } > } > } > } > close $FH; > } > } > close $OUT; > closedir (DH); > print "accepted records: $accepted\n, rejected records: $rejected\n"; > return $output; > } -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/