On Tue, Aug 19, 2003 at 05:54:42PM -0500, Dan Muey wrote:
> Howdy all:
> 
> I'm trying to figure out the best way to test a string agains a list of regexs like 
> so:
> 
> my @regex = qw(qr(joe$) qr(^mama) qr([abc]));

As was pointed out already, don't use the qw().

Here are some interesting benchmarks.  The 'docs_in_col' file was
about 16k, and the strings I was testing for were right at the bottom.

#!/usr/bin/perl

use warnings;
use strict;
use IO::File;
use Benchmark;

my $fh = new IO::File("docs_in_col") or die $!;
my $str;
{
    local $/;
    $str = <$fh>;
}

my $alt_re  = qr(Zucker|Zuckerman|Zurrow);
my @grep_re = (qr(Zucker), qr(Zuckerman), qr(Zurrow));

timethese(5000, 
          { 
              match_alts_scalar => \&match_alts_scalar,
              match_alts_array => \&match_alts_array,
              grep_mults => \&grep_mults,
          }
         );

###----------------------------------------------------

sub match_alts_scalar {
    my $found = ($str =~ /$alt_re/);
    return $found;
}

###----------------------------------------------------

sub grep_mults {
    my $found = grep { $str =~ /$_/ } @grep_re;
    return $found;
}


And here are the results:

Benchmark: timing 5000 iterations of grep_mults, match_alts_scalar...
grep_mults:  1 wallclock secs ( 0.54 usr +  0.00 sys =  0.54 CPU) @ 9275.36/s (n=5000)
match_alts_scalar: 40 wallclock secs (39.20 usr +  0.02 sys = 39.22 CPU) @ 127.49/s 
(n=5000)


This should only be considered a first approximation, since there are
various things I'm not controlling for (all my strings were at the
end, they were fixed strings, etc).

Still, I would use the grep version.


--Dks

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to