On Thu, Jun 10, 2010 at 10:26, Rob Dixon <rob.di...@gmx.com> wrote: snip > It is possible that index() is faster than regular expressions, but I would > write the code below. snip
It looks like, at least on Perl 5.12.0, iterated_regex is faster than index for finding non-overlaping substrings when the substring is small, but index starts to win as the substring grows in size. The real winner is split though. It is basically the same algorithm as iterated_regex, but the loop is implemented in C instead of Perl. I would probably still go with an inline pure_regex type implementation. I don't see the value of making a function to do this unless profiling shows that I have a bottleneck in that section of the code and need the extra performance of something that can't be easily inlined. Perl version 5.012000 index => 2 iterated_regex => 2 pure_regex => 2 split => 2 Rate pure_regex index iterated_regex split pure_regex 608548/s -- -6% -46% -74% index 649176/s 7% -- -42% -73% iterated_regex 1124391/s 85% 73% -- -52% split 2360644/s 288% 264% 110% -- Rate pure_regex index iterated_regex split pure_regex 613304/s -- -6% -44% -75% index 649177/s 6% -- -41% -73% iterated_regex 1101602/s 80% 70% -- -54% split 2406041/s 292% 271% 118% -- Rate pure_regex index iterated_regex split pure_regex 590160/s -- -9% -46% -75% index 651987/s 10% -- -40% -73% iterated_regex 1092266/s 85% 68% -- -55% split 2406041/s 308% 269% 120% -- Rate pure_regex index iterated_regex split pure_regex 601510/s -- -7% -47% -75% index 649176/s 8% -- -42% -73% iterated_regex 1124391/s 87% 73% -- -54% split 2429401/s 304% 274% 116% -- Rate pure_regex index iterated_regex split pure_regex 590160/s -- -8% -48% -76% index 643109/s 9% -- -43% -74% iterated_regex 1137401/s 93% 77% -- -54% split 2477508/s 320% 285% 118% -- substring of 10 sets Rate pure_regex iterated_regex index split pure_regex 128/s -- -47% -56% -91% iterated_regex 241/s 88% -- -17% -84% index 290/s 126% 20% -- -80% split 1478/s 1050% 513% 409% -- substring of 100 sets Rate iterated_regex pure_regex index split iterated_regex 522/s -- -35% -64% -74% pure_regex 807/s 55% -- -44% -60% index 1437/s 175% 78% -- -28% split 1999/s 283% 148% 39% -- substring of 1000 sets Rate iterated_regex pure_regex split index iterated_regex 599/s -- -62% -71% -75% pure_regex 1569/s 162% -- -24% -36% split 2055/s 243% 31% -- -16% index 2443/s 308% 56% 19% -- substring of 10000 sets Rate iterated_regex pure_regex split index iterated_regex 570/s -- -56% -68% -77% pure_regex 1309/s 130% -- -26% -48% split 1761/s 209% 34% -- -30% index 2510/s 340% 92% 43% -- #!/usr/bin/perl use strict; use warnings; use Benchmark; print "Perl version $]\n"; my $string = "ababababab"; my $substring = "abab"; my %subs = ( pure_regex => sub { return scalar( ()= $string =~ /\Q$substring/g ); }, iterated_regex => sub { my $n; $n++ while $string =~ /\Q$substring/g; return $n; }, index => sub { my $offset = 0; my $result = index $string, $substring, $offset; my $length = length $substring; my $n; while ($result != -1) { $n++; $offset = $result + $length; $result = index $string, $substring, $offset; } return $n; }, split => sub { my $c = split /\Q$substring/, $string; #handle the empty string case return $c ? $c - 1 : 0; } ); for my $sub (sort keys %subs) { print "$sub => ", $subs{$sub}->(), "\n"; } for $string ("abab", "abab" x 10, ("x" x 1000) . "abab", "x" x 10_000, "ab" x 10_000 ) { Benchmark::cmpthese -1, \%subs; } $string = "abab" x 100_000; for my $n (10, 100, 1_000, 10_000) { print "\nsubstring of $n sets\n"; $substring = "abab" x $n; Benchmark::cmpthese -1, \%subs; } -- Chas. Owens wonkden.net The most important skill a programmer can have is the ability to read. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/