On Jun 27, Nikola Janceski said:

>our %except = (
>       "test1" => [ "stuff to add" ],
>       "test2" => [ "more stuff to add" ]
>       # etc. for about 4 or 5 more
>       );
[snip]
>FILE: foreach my $file ( @arrayOfManyManyFiles ){
>
>       foreach my $test (keys %except) {
>               if($file =~ /$test/){
>                       # do stuff with @{ $except{$test} }
>                       next FILE;
>               }
>       }

Ok, here comes the long-awaited (and requested) qr// story.  This is all
about how regexes are compiled (and not compiled).

When Perl sees a regex in your code, it checks to see if it's able to be
compiled at compile-time (like /foo/ or qr{blah}).  If so, then you're all
good.  If Perl sees that there are variables or something that isn't known
until run-time there, it will wait until run-time to compile the regex.
That means /([$letters])/ is compiled at run-time.

Now, when a regex is compiled, Perl also keeps a string representation of
the regex around for quick comparisons.  Why?  Because when Perl sees a
regex with variables in it, it needs to see if the regex needs to be
recompiled, as in this example:

  for (1 .. 10) {
    if ($string =~ /order '$_'/) { ... }
  }

The regex is compiled 10 times (/order '1'/, /order '2'/, etc.).  What
Perl does is check if the string version of the regex is the same when the
variables are interpolated.  That means that while

  for (['this', 'that'], ['mine', 'yours']) {
    my ($left, $right) = @$_;
    if ($str =~ /$left$right/) { ... }
  }

requires the regex to be compiled twice (/thisthat/, /mineyours/), this
code:

  for (['this', 'that'], ['thisth', 'at']) {
    my ($left, $right) = @$_;
    if ($str =~ /$left$right/) { ... }
  }

does NOT recompile the regex, because the string representation is exactly
the same (/thisthat/, /thisthat/).

Now, when you have a regex with variables in it, Perl will ALWAYS do the
string comparison check, unless you give the regex the /o modifier, which
tells Perl to compile the regex once and forget it has variables in it.
Thus, in the second iteration of

  for (['this', 'that'], ['mine', 'yours']) {
    my ($left, $right) = @$_;
    if ($str =~ /$left$right/o) { ... }  # /o !!!
  }

the regex is still matched /thisthat/, not /mineyours/.

The /o is irreversible.  Full stop.

  sub match {
    my ($str, $regex) = @_;
    return $str =~ /$regex/o;
  }

  print match("japhy", qr/a.h/);  # 1
  print match("Perl", qr/e.l/);   # 0, because of /o

Because of the /o, the match() function will remember the first regex it
used, and ONLY that one.  Every subsequent call to match() will use the
regex /a.h/.

Now, how does qr// come into this?  qr// creates a compiled regex.  That
means that Perl won't need to recompile your regex.

  @strings = qw( this that those );
  @rex = (qr/this/, qr/that/, qr/those/);
  @words = (...);  # assume 10 elements

  for my $w (@words) {
    for my $s (@strings) {
      print "$w $s\n" if $w =~ $s;
    }
    for my $p (@rex) {
      print "$w $p\n" if $w =~ $p;
    }
  }

The @strings loop is much slower than the @rex loop.  Why?  Because the
@strings loop requires THREE comparison-compiles for each word in the
@words array, meaning 30 comparison-compiles.

The @rex loop, while it does still do 30 comparisons, does absolutely NO
compiles, because Perl recognizes that the regex is made up entirely of a
variable, and that variable is a compiled regex (a Regexp object).

Comparisons are very fast.  They're strcmp()s, really.  Compilation takes
longer.

Now for YOUR problem, Nikola.  The issue is hash KEYS.  Hash keys must be
strings.  That means you can't REALLY store a regex as a hash key, because
it gets turned into a string, and loses the magic that tells Perl not to
recompile it.

The solution?  More satellite data:

  my %data = (
    this => [qr/this/, 'foo', 'bar'],
    that => [qr/that/, 'blah', 'gak!'],
    ...,
  );

  while (<FILE>) {
    for my $key (keys %data) {
      if (/$data{$key}[0]/) { ... }  # using the Regexp object
    }
  }

Ta da.

NOTE: if you're just using STRINGS, though, and not regexes, I'd suggest
using index() instead.

-- 
Jeff "japhy" Pinyan      [EMAIL PROTECTED]      http://www.pobox.com/~japhy/
RPI Acacia brother #734   http://www.perlmonks.org/   http://www.cpan.org/
** Look for "Regular Expressions in Perl" published by Manning, in 2002 **
<stu> what does y/// stand for?  <tenderpuss> why, yansliterate of course.
[  I'm looking for programming work.  If you like my work, let me know.  ]


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to