On Mar 11, 2004, at 10:41 AM, Price, Jason wrote:Hi, I do have similar situation, say I have text a file with 5000 columns and 3000 rows. I need to subset the data about 20 columns to a file( total ~250 files) according to another file tell me where do I cut the columns. What is the best strategy to do this? Read whole file into AOA then cut it or use while(<>). So far I use while(<>) approach, it's working, but slow.
I'm trying to optimize a script used for processing large text log files (around 45MB). I think I've got all the processing fairly well optimized, but I'm wondering if there's anything I can do to speed up the initial loading of the file.
Currently, I'm performing operations on the file one line at a time, using a "while (<FILE>)" loop. I'm pushing important lines into an array, so further processing is done in memory, but the initial pass on the file is rather time consuming. Is there a more efficient way to work with large text files?
If you post the script, we might me able to give some helpful suggestions.
James
Thanks,
Shiping
Here is code: ________________________________________________________________________________________________________________ #!/usr/local/bin/perl use warnings; use strict; use Math::Matrix; use File::Basename;
### open input file for further process ### if (@ARGV < 2){ die "Usage: $0 CutRegion MarkerData $!"; exit(0); } my %region; open (Region, "$ARGV[0]") or die $!; while (<Region>){ chomp; my ($start, $end) = split; $region{$start} = $end; # print "$start => $region{$start}\n"; } close Region;
my @nonMarkVar;
my $inData = $ARGV[1];
open (nonMarkdata, "$inData") or die "No such file or wrong file name!";
while (<nonMarkdata>){
chomp;
my @tmp = (split/\s+/)[0..4];
push @nonMarkVar, [EMAIL PROTECTED];
}
close nonMarkdata;
my $cnt4ext;
foreach my $key (sort {$a <=> $b} keys %region){ my @getback; my $first = $key + 4; my $last = $region{$key} + 4;
my @Markers; open (Markdata, "$inData") or die "No such file or wrong file name!"; while (<Markdata>){ chomp; my @tmp2 = (split/\s+/)[$first..$last]; push @Markers, [EMAIL PROTECTED]; } close Markdata;
++$cnt4ext;
my $basename = basename($inData,".txt");
my $outfile = $basename.'.'.$cnt4ext;
my @fruit;
open (outH, ">$outfile") or die "Can not create outfile!";
print "Processing data: Marker column $key - $region{$key}, output to $outfile\n";
#sleep (1);
my @trans = transpose([EMAIL PROTECTED]);
my @b4final;
foreach my $j (@trans) {
my @conv = char2num([EMAIL PROTECTED]);
push @b4final, [EMAIL PROTECTED];
}
my @transback = transpose([EMAIL PROTECTED]);
# put non marker data part and marker data back together;
my $x = new Math::Matrix (@nonMarkVar); my $y = new Math::Matrix (@transback); my $m = $x->concat($y); @fruit = $m; @fruit = sort { $b->[4] <=> $a->[4] } @fruit; foreach my $j ([EMAIL PROTECTED]) { print outH join " ", @{$j}, "\n"; } close outH; } ### subroutine to transpose matrix ###; sub transpose { transpose columns into rows.................. } sub char2num{ convert char to num.................... }
-- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>