Re: regex optimization

Brad Baxter Wed, 06 Jan 2010 23:10:01 -0800

Dr.Ruud wrote:

Jeff Peng wrote:

Can the code (specially the regex) below be optimized to run faster?

#!/usr/bin/perl
for ($i=0; $i<1000; $i+=1) {

 open HD,"index.html" or die $!;
 while(<HD>) {
   print $1,"\n" if /href="http:\/\/(.*?)\/.*" target="_blank"/;
 }
 close HD;
}


Let me first "normalize" the code.

  #!/usr/bin/perl
  use strict;
  use warnings;

  my $fname = "index.html";

  for my $i ( 0 .. 999 ) {

      open my $fh, "<", $fname or die $!;

      while( <$fh> ) {
          print $1,"\n"
            if m{href="http://(.*?)/.*" target="_blank"};
      }
      close $fh;
  }

So it captures hostnames out of href/target strings.
(for example only out of the first one in a line)

I would add a question mark afther the second ".*", to minimizebacktracking. But that changes the meaning.


Further there is no need to open the file 1000 times, see -f seek.


And for the sake of argument, the regex at best makes
assumptions about what's in index.html, at worst, it
gives incorrect results, e.g., from the following:

<html>

<a href="http://www.amazon.com/";>Amazon</a> <ahref="http://www.google.com/"; target="_blank">Google</a>

</html>

I would assume from the regex that google's address
is the one the user wants, but amazon's is what he
will get.

Before going to the trouble of optimizing for speed,
I think it would be best to optimize for correctness
first.  :-)

--
Brad

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: regex optimization

Reply via email to