Pipeline Performance

Rod Adams Mon, 30 Aug 2004 13:34:34 -0700

Over in the Perl Question of the Week list, ( http://perl.plover.com/qotw/ ), we entered a discussion of the performance, or lack thereof, of pipelining in Perl 5. Randy Sims's example code demonstrates this well, and is attached at the bottom of this post. The overall point is that pipelining arrays is significantly slower than unraveling the pipeline and creating a for or while loop, largely due to the temporary lists that must be created.

My question is, is there anything that can be done within Perl 6 to help alleviate this issue. It would seem that there are basically two types of functions which get placed into a pipeline: sequential and non-sequential. Sequential functions, like map and grep, do not need the entire input list to begin generating output. Non-sequential functions, like sort and reverse, need the entire input list before any output can be generated. In this regard, non-sequential functions cannot helped.

But sequential ones, on the other hand, should not need to collect their entire input before passing on output. I've seen discussions of lazy lists around here before, but I forget the context. Could one write a lazy version of C<map>, which only reads an input when an output is being requested? And therefore something like:

@x = @y ==> map lc ==> grep length == 4;

Would behave quite differently from:

@temp = map lc, @y;
@x = grep length ==4, @temp;

in that no version of @temp (even internal) is needed.

Of course, I'd want to be able to construct my own map like functions, and have creating them with pipeline performance in mind to be easy.

Just a thought, since with the creation of ==> and <==, pipelining is bound to become an even more common construct.

-- Rod Adams

Randy Sims's test case:

#!/usr/bin/perl

use strict;
use warnings;

use Benchmark qw(cmpthese);

my $dict = 'projects/qotw/words';


# prime disk cache
open( my $fh, $dict ) or die;
my @result = <$fh>;
close( $fh );


cmpthese( 10, {
 'simple'  => sub{ simple( $dict, 5 ) },
 'compound' => sub{ compound( $dict, 5 ) },
});


sub simple {
   my( $dict, $word_length ) = @_;
   open( my $fh, $dict ) or die;
   my @result;
   while (defined( my $line = <$fh> )) {
       chomp( $line );
       $line = lc( $line );
       push( @result, $line ) if length( $line ) == $word_length;
   }
   close( $fh );
}

sub compound {
   my( $dict, $word_length ) = @_;
   open( my $fh, $dict ) or die;
   my @result = grep length( $_ ) == $word_length,
            map { chomp; lc }
        <$fh>;
   close( $fh );
}

__END__

Pipeline Performance

Reply via email to