At 07:55 PM 3/11/2004 -0600, James Edward Gray II wrote:
On Mar 11, 2004, at 10:41 AM, Price, Jason wrote:

I'm trying to optimize a script used for processing large text log files
(around 45MB).  I think I've got all the processing fairly well optimized,
but I'm wondering if there's anything I can do to speed up the initial
loading of the file.

Currently, I'm performing operations on the file one line at a time, using a
"while (<FILE>)" loop.  I'm pushing important lines into an array, so
further processing is done in memory, but the initial pass on the file is
rather time consuming.  Is there a more efficient way to work with large
text files?

If you post the script, we might me able to give some helpful suggestions.


James
Hi, I do have similar situation, say I have text a file with 5000 columns and 3000 rows. I need to subset the data about 20 columns to a file( total ~250 files) according to another file tell me where do I cut the columns. What is the best strategy to do this? Read whole file into AOA then cut it or use while(<>). So far I use while(<>) approach, it's working, but slow.

Thanks,

Shiping

Here is code:
________________________________________________________________________________________________________________
#!/usr/local/bin/perl
use warnings;
use strict;
use Math::Matrix;
use File::Basename;

### open input file for further process ###
if (@ARGV < 2){
        die "Usage: $0 CutRegion MarkerData $!";
        exit(0);
        }
my %region;
open (Region, "$ARGV[0]") or die $!;
        while (<Region>){
                chomp;
                my ($start, $end) = split;
                $region{$start} = $end;
                # print "$start => $region{$start}\n";
        }
close Region;

my @nonMarkVar;

my $inData = $ARGV[1];
open (nonMarkdata, "$inData") or die "No such file or wrong file name!";
while (<nonMarkdata>){
chomp;
my @tmp = (split/\s+/)[0..4];
push @nonMarkVar, [EMAIL PROTECTED];
}
close nonMarkdata;


my $cnt4ext;

foreach my $key (sort {$a <=> $b} keys %region){
        my @getback;
        my $first = $key + 4;
        my $last = $region{$key} + 4;

        my @Markers;
        open (Markdata, "$inData") or die "No such file or wrong file name!";
                while (<Markdata>){
                        chomp;
                        my @tmp2 = (split/\s+/)[$first..$last];
                    push @Markers, [EMAIL PROTECTED];
                }
        close Markdata;

++$cnt4ext;
my $basename = basename($inData,".txt");
my $outfile = $basename.'.'.$cnt4ext;
my @fruit;
open (outH, ">$outfile") or die "Can not create outfile!";
print "Processing data: Marker column $key - $region{$key}, output to $outfile\n";
#sleep (1);
my @trans = transpose([EMAIL PROTECTED]);
my @b4final;
foreach my $j (@trans) {
my @conv = char2num([EMAIL PROTECTED]);
push @b4final, [EMAIL PROTECTED];
}
my @transback = transpose([EMAIL PROTECTED]);
# put non marker data part and marker data back together;


                        my $x = new Math::Matrix (@nonMarkVar);
                        my $y = new Math::Matrix (@transback);
                        my $m = $x->concat($y);
                        @fruit = $m;
                        @fruit = sort { $b->[4] <=> $a->[4] } @fruit;
                        foreach my $j ([EMAIL PROTECTED]) {
                                print outH join " ", @{$j}, "\n";
                        }
        close outH;
}
### subroutine to transpose matrix ###;
sub transpose {
        transpose columns into rows..................
}
sub char2num{
        convert char to num....................
}



--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>




Reply via email to