Dr.Ruud wrote:
Jeff Peng wrote:
Can the code (specially the regex) below be optimized to run faster?
#!/usr/bin/perl
for ($i=0; $i<1000; $i+=1) {
open HD,"index.html" or die $!;
while(<HD>) {
print $1,"\n" if /href="http:\/\/(.*?)\/.*" target="_blank"/;
}
close HD;
}
Let me first "normalize" the code.
#!/usr/bin/perl
use strict;
use warnings;
my $fname = "index.html";
for my $i ( 0 .. 999 ) {
open my $fh, "<", $fname or die $!;
while( <$fh> ) {
print $1,"\n"
if m{href="http://(.*?)/.*" target="_blank"};
}
close $fh;
}
So it captures hostnames out of href/target strings.
(for example only out of the first one in a line)
I would add a question mark afther the second ".*", to minimize
backtracking. But that changes the meaning.
Further there is no need to open the file 1000 times, see -f seek.
And for the sake of argument, the regex at best makes
assumptions about what's in index.html, at worst, it
gives incorrect results, e.g., from the following:
<html>
<a href="http://www.amazon.com/">Amazon</a> <a
href="http://www.google.com/" target="_blank">Google</a>
</html>
I would assume from the regex that google's address
is the one the user wants, but amazon's is what he
will get.
Before going to the trouble of optimizing for speed,
I think it would be best to optimize for correctness
first. :-)
--
Brad
--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/