On Friday, October 4, 2002, at 09:21  AM, Jerry Preston wrote:

> Hi!,

Howdy.

> I am look for a better way and a faster way to deal with a 4 - 8 meg  
> data
> file.   This file has been saved as an .cvs file for excel to read in.

A "better" way, is pretty open to interpretation, so here's my  
interpretation.

> All I am interested in is the first  three cells of ',' delimited data.
>
> Die,Row 0, Column 11
> Test Result,1
>  Score,1
>  PMark Score,0
> k Score,0
> Score,0
>
> Defects,0
>
> Mark Measurements,276
> Measurement
> 0,0,8.030399,740.998413,21.542923,16.721996,817.562500,22.048611,881.06 
> 2500,
> 29.847174,11.604215,17.210899,685.522644,16.721996,0,0
> Measurement
> 1,1,12.346605,804.399353,25.516476,8.607447,817.562500,8.607447,881.062 
> 500,2
> 6.055706,28.836847,20.028336,748.923584,9.931009,0,0
>
>   open( FI, $file_path )   || die "unable to open $file_path $!\n";
>   @file_data = <FI>;
>   close FI;

That array assignment above "slups" the entire file into memory.  Let's  
not do that.  Remove it and the close line.

>   LINE: foreach $_ ( @file_data ) {

Then we can make a simple change right here:

LINE:  while (< FI>) {

This line reads one line and assigns it to $_, but we process each line  
as it comes in now instead of slurping huge files into memory.  Much  
better.

>     if( /^Die,Row/ ) {
>       ($row, $col) = /(\d+)/g;
>     }

Better would be:

if (/^Die,Row (\d+), Column (\d+)/) {
        ($row, $col) = ($1, $2);
}

>     if( /^PMark Measurements/ ) {
>       ($cnt) = /(\d+)/g;

The if should probably be an elsif here, so it's only checked if the  
first if failed.  In other words, its a Die,Row line or a Mark  
Measurements line, not both.

I believe you also have a rogue P in that pattern.

And again, better is:

elsif (/^Mark Measurements,(\d+)/) {
        $cnt = $1;
}

Below this your code gets pretty confusing.  Here are some thoughts:

* use strict; at the top of the program.  This forces you to declare  
your variables before you use them, making your code easier for us the  
read and thus help you with.  Always, always, ALWAYS do this!

* Why are we checking the line type again below?  Let's do everything  
we need to do with a line before we move on.  (The /^Measurement/ check  
below should be /^Mark Measurement/ too, I think.)

* Yikes, while (<FI>) {, are we reading from a file handle we closed?   
Let's not do that.  Try to rethink your logic (or explain it to us and  
let us help you rethink it) too handle one line at a time.  That's a  
pretty good general rule for parsing.

* Do you realize that

@fields = split /,/, $_;

Fills the fields array with all the values on a line that you could  
then walk through?  WARNING:  This only works if commas do not appear  
in the fields, but that looks true in your example data.

Clean it up a bit.  Help us read it and send it back to us if you're  
still having problems.

James Gray

>       if( $cnt > $max ) {
>         $max = $cnt;
>       }
>       if( $cnt > 0 ) {
>         $row_col[ $jp++ ] = "$row,$col";
>         while( <FI> ) {
>           if( /^Die,Row/ ) {
>             ($row, $col) = /(\d+)/g;
>             $row_col[ $jp++ ] = "$row,$col";
>             next LINE;
>           }
>           $Z=0;
>           if( /^Measurement/ ) {
>             (@data) = split( /\,/ );
>             if( $data[ 2 ] > 0 ) {
>               ($meas) = ($data[0]) =~ /(\d+)/g;
>               $data{ "$row,$col" }{ $meas } = $data[2];
>               $data{ "$row,$col" }{ $meas }{ $data[1] } = $data[2];
>             }
>           }
>         }
>       }
>     }
>   }
> @file_data = ();
>
>


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to