Re: re-reading from already read file handle

Rajeev Prasad Mon, 20 Aug 2012 15:33:35 -0700

Thx. I did some timestamp prints from within script, this piece is taking too 
long: almost 5 minutes to complete...!!!

fyi, the strArr array contains about 1500 string elements. (this loop runs that 
many times)

the file tmp_FH_SR  is 27Mb and 300,000 lines of data.
the file tmp_FH_RL is 13 Mb with around 150,000 lines of data.

I have changed the names of variable to protect actual names...

in the first while, based on the fact that the $str was found only once in the 
file, i obtain another field from the matching record. I use this field to 
search for no of occurances of this filed in another file. Based on that output 
i do further things with $str.

        my $tmp_srt;

        foreach my $str (@strArr)
        {
            my $tmp1;
            my $count=0; 
            seek $tmp_FH_SR,0,0;
            while (<$tmp_FH_SR>)
            {
                my $line=$_;chomp($line);
                if ($line=~ m/\"$str\"/)
                {
                    $count++;
                    if ($count == 1)
                    {
                        my @tmp_line_ar = split(/\,/,$line);
                        $tmp_str=$tmp_line_ar[10];
                    }
                }
            }
            if ($count == 1)
            {
                seek $tmp_FH_RL,0,0;
                while (<$tmp_FH_RL>) 
                {
                    my $line=$_;chomp($line);
                    if ($line=~m/\"$tmp_str\"/) {$count++;}
                }
                if($count == 1){push(@another_str_arr,$str);}
            }
        }

how can i make it faster? read the 27mb and 13mb files in an array one time and 
work? 

________________________________
 From: Jim Gibson <jimsgib...@gmail.com>
To: perl list <beginners@perl.org> 
Sent: Monday, August 20, 2012 4:04 PM
Subject: Re: re-reading from already read file handle

On Aug 20, 2012, at 1:39 PM, Rajeev Prasad wrote:

> thank you. seek did the job.
> 
> by the way can this be made any better?
> 
> just want to find out in how many records string was found:
> 
>             my $count=0;
>             seek $tmp_FH,0,0;
>             while (<$tmp_FH>)
>             {
>                 my $line=$_;chomp($line);
>                 if ($line=~m/\"$str\"/) {$count++;}        #in the file $str 
>string would be in quotes....
>             }

Two possible efficiency improvements:

1. Since you are just searching each line for a string, you can skip the 
chomp($line).

2. Searching for a fixed string might be done faster by using the index() 
function rather than invoking the regular expression engine:

    $count++ if index($line,"\"$str\"") != -1;

Neither of these modifications is likely to produce a significant speed-up. You 
should benchmark them to see what the difference actually is to see if it is 
worth it in your case. The speed bottleneck will still be reading the file from 
a physical device. 

If possible, you should combine searching for the string with whatever 
processing you are doing on the file during the first read, and only read the 
file once.

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: re-reading from already read file handle

Reply via email to