My question is, is there anything that can be done within Perl 6 to help alleviate this issue. It would seem that there are basically two types of functions which get placed into a pipeline: sequential and non-sequential. Sequential functions, like map and grep, do not need the entire input list to begin generating output. Non-sequential functions, like sort and reverse, need the entire input list before any output can be generated. In this regard, non-sequential functions cannot helped.
But sequential ones, on the other hand, should not need to collect their entire input before passing on output. I've seen discussions of lazy lists around here before, but I forget the context. Could one write a lazy version of C<map>, which only reads an input when an output is being requested? And therefore something like:
@x = @y ==> map lc ==> grep length == 4;
Would behave quite differently from:
@temp = map lc, @y; @x = grep length ==4, @temp;
in that no version of @temp (even internal) is needed.
Of course, I'd want to be able to construct my own map like functions, and have creating them with pipeline performance in mind to be easy.
Just a thought, since with the creation of ==> and <==, pipelining is bound to become an even more common construct.
-- Rod Adams
Randy Sims's test case:
#!/usr/bin/perl
use strict; use warnings;
use Benchmark qw(cmpthese);
my $dict = 'projects/qotw/words';
# prime disk cache open( my $fh, $dict ) or die; my @result = <$fh>; close( $fh );
cmpthese( 10, { 'simple' => sub{ simple( $dict, 5 ) }, 'compound' => sub{ compound( $dict, 5 ) }, });
sub simple { my( $dict, $word_length ) = @_; open( my $fh, $dict ) or die; my @result; while (defined( my $line = <$fh> )) { chomp( $line ); $line = lc( $line ); push( @result, $line ) if length( $line ) == $word_length; } close( $fh ); }
sub compound { my( $dict, $word_length ) = @_; open( my $fh, $dict ) or die; my @result = grep length( $_ ) == $word_length, map { chomp; lc } <$fh>; close( $fh ); }
__END__