Re: rewriting a single column in an open file... more efficient IO

Uri Guttman Mon, 09 May 2011 17:52:58 -0700

>>>>> "D" == D  <demianricca...@gmail.com> writes:


  D> I would like to learn an efficient way to change a single column in
  D> a file that is accessed by an external program after the column is
  D> changed each time.  open write close is what I have been using.  I
  D> thought that tieing could help speed it up.  While I didn't dig in
  D> too deeply, my split entry, change value and rejoin didn't seem to
  D> gain me much speed.  The test file and script are pasted below.  In
  D> practice the file will be about 100 lines long and the 3rd column
  D> will be rewritten thousands of times.  Is there a more efficient
  D> approach?

  D> -------------------------------
  D> use IO::All;

that is an overkill module IMO. it can do all but you don't need all. in
fact iirc it uses file::slurp inside to read whole files. using that
directly will speed it up. 

  D> use warnings;
  D> use strict;

  D> my $lines = io('test')->new;

        use File::Slurp ;
        my $lines = read_file( 'test', { array_ref => 1 } ) 

just benchmark that against the io call and see which is faster. use the
Benchmark.pm module that comes with perl.

  D> print "$_ \n" foreach @$lines;

that is slow to call print for each line. why are you even printing it here?

  D> my @tmp;
  D> foreach (0 .. $#{$lines}){
  D>  $tmp[$_] = $_;
  D> }

why do you need to build up the array of indexes IN an array? you
already have the indexes below in the map.

  D> @$lines = map {
  D>                 my @sh = split /\s+/, $lines->[$_];
  D>                 join("   ",$sh[0],$sh[1],$tmp[$_]);
  D>               } 0 .. $#{$lines};

that may seem fast to you because it is one line but it can be made MUCH
faster and with much less code. you are doing work there that doesn't
need to be done at all. i have several questions about the data and
change logic

is the file well defined with white space separation? is the third field
always the last field of non-whitespace?  is the value of the third
field always 0 to start? is it always replaced by its line number? if
those are all yes, then you can do this and it will blow away your
example in speed (untested):


use File::Slurp ;

my $text ;
read_file( 'test', { buf_ref => \$text } ) ;

my $ind = 0 ;
$text =~ s/0\s*$/$ind++/emg ;

write_file( 'test', { buf_ref => $text } ) ;


done.

benchmark that and i expect it to be seriously faster. the key is the
s/// op on the whole file and no looping is done per line (the looping
is inside the regex due to the /g option). also there is no pulling
apart each line and putting it back together.

and on top of that there is a beta version of File::Slurp which has a
file_edit() call. using it would look like this:

use File::Slurp ;

my $ind = 0 ;
edit_file { s/0\s*$/$ind++/emg } 'test' ;

done. :)

that version should be released pretty soon.

uri

-- 
Uri Guttman  ------  u...@stemsystems.com  --------  http://www.sysarch.com --
-----  Perl Code Review , Architecture, Development, Training, Support ------
---------  Gourmet Hot Cocoa Mix  ----  http://bestfriendscocoa.com ---------

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: rewriting a single column in an open file... more efficient IO

Reply via email to