Re: Benchmark puzzle

Uri Guttman Thu, 11 Nov 2010 13:12:54 -0800

>>>>> "MM" == Mike McClain <mike.j...@nethere.com> writes:


  MM> Thinking the number of inerations of the benchmark loop might be
  MM> low enough to affect the results I doubled it and reran the above
  MM> command.

a good trick when using benchmark is to support a command line arg to
set the iteration count with a default. also the count can be negative
which is the number of seconds to run each entry which can be more
useful or quicker to run. 

  MM> Condensed here are the results:
  MM> 2.5M reps
  MM> run 1       run 2       run 3       run 4       run 5
  MM> via_match   via_substr  via_arrays  via_arrays  via_match2
  MM> via_unpack  via_match   via_match   via_match   via_unpack
  MM> via_arrays  via_unpack  via_match2  via_substr  via_arrays
  MM> via_match3  via_arrays  via_substr  via_match3  via_match
  MM> via_match2  via_match2  via_match3  via_unpack  via_substr
  MM> via_substr  via_match3  via_unpack  via_match2  via_match3
  MM> 5M reps
  MM> run 1       run 2       run 3       run 4       run 5
  MM> via_match   via_match3  via_match2  via_unpack  via_unpack
  MM> via_substr  via_arrays  via_arrays  via_match2  via_match3
  MM> via_arrays  via_match   via_match3  via_match3  via_match
  MM> via_unpack  via_unpack  via_substr  via_arrays  via_substr
  MM> via_match3  via_match2  via_unpack  via_substr  via_arrays
  MM> via_match2  via_substr  via_match   via_match   via_match2

  MM> I'm at a loss to understand this and hope someone on the list
  MM> can explain the variance.

hard to say but you are probably running too many reps. sometimes if you
do so other machine issues like ram and thrashing disks make affect
things. that is another reason to use the -n count value to
benchmark. you can keep each run to say 2 seconds and you know your box
is mostly idle during that time.

  MM> use diagnostics;

why have that for a benchmark?

  MM> {   use Benchmark qw(:all);

  MM>     my $Testing = ($ARGV[0] || 0);

add in a count arg. and no need to initialize (or default a boolean) to
0

        my $count = shift || -2 ;
        my $Testing = shift ;



  MM>     my $word = "Thequickbrownfox..";
  MM>     my $size = 3;

  MM>     if( $Testing )
  MM>     {   via_arrays($word, $size);
  MM>         via_substr($word, $size);
  MM>         via_unpack($word, $size);
  MM>         via_match( $word, $size);
  MM>         via_match2($word, $size);
  MM>         via_match3($word, $size);

just exit here and no need for the else!

  MM>     }
  MM>     else
  MM>     {   print "this is a benchmark, wait ... \n";

no need for that if you use a better count

  MM>         cmpthese( 5_000_000, {

        cmpthese( $count, {

  MM>     sub via_arrays  #   
  MM>     {   my ($word, $size) = @_;
  MM>         my @array = split //, $word;
  MM>         my $max = @array - $size;
  MM>         my @list = ();
  MM>         for( my $i=0; $i<$max; $i+=$size )
  MM>         {   push @list, join '', @array[ $i .. $i+$size-1 ];
  MM>         }
  MM>         print( 'via_arrays=', map { "$_," } @list, "\n")       
if($Testing);
  MM>     }

i would call that via_chars as you deal with individual chars. by all
things perl that should be the slowest as it does the most work.


  MM>     sub via_unpack  #   
  MM>     #           unpack("x$offset A$length", $what);
  MM>         {   my ($word, $size) = @_;
  MM>         my $max = length( $word ) - $size;
  MM>         my @list = ();
  MM>         my $i=0;
  MM>         do
  MM>         {   push @list, (unpack( "x$i A$size", $word ));

unpack has a repeat count. it will be much faster to use it.

perl -le '$x = 3 ;@a = unpack "(A$x)*", "abcdefgh"; print "@a"'
abc def gh


  MM>     }

  MM>     sub via_match   #   
  MM>     {   my ($word, $size) = @_;
  MM>         my @list = ();
  MM>         push @list, substr( $word, $-[0], $size )
  MM>             while $word =~ /.{$size}/g;
  MM>         print( 'via_match =', map { "$_," } @list, "\n")        
if($Testing);
  MM>     }

gack! why not just grab and use $1? the extra substr and @- is wasted.

  MM>     sub via_match2  #   
  MM>     {   my ($word, $size) = @_;
  MM>         my @list = ();
  MM>         push @list, $_      for $word =~ /.{$size}/g;
  MM>         print( 'via_match2=', map { "$_," } @list, "\n")        
if($Testing);
  MM>     }

  MM>     sub via_match3  #   
  MM>     {   my ($word, $size) = @_;
  MM>         my @list = $word =~ /.{$size}/g;
  MM>         print( 'via_match3=', map { "$_," } @list, "\n")        
if($Testing);
  MM>     }
  MM> }

those last two are also silly. a simple /g and grab does this in one
line.

perl -le '$x = 3 ;@a = "abcdefgh" =~ /(.{1,$x})/g; print "@a"'
abc def gh

so i am not concerned about your inconsistancy of your benchmark as much
as how you coded up your choices. the faster ones i showed you should
always be (much) faster than the code you have. see if that clears
things up.

uri

-- 
Uri Guttman  ------  u...@stemsystems.com  --------  http://www.sysarch.com --
-----  Perl Code Review , Architecture, Development, Training, Support ------
---------  Gourmet Hot Cocoa Mix  ----  http://bestfriendscocoa.com ---------

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: Benchmark puzzle

Reply via email to