regex optimization

Jeff Peng Mon, 04 Jan 2010 22:46:58 -0800

Hello,

Can the code (specially the regex) below be optimized to run faster?


#!/usr/bin/perl
for ($i=0; $i<1000; $i+=1) {

 open HD,"index.html" or die $!;
 while(<HD>) {
   print $1,"\n" if /href="http:\/\/(.*?)\/.*" target="_blank"/;
 }
 close HD;
}

The "index.html" is got from:
wget http://www.265.com/Kexue_Jishu/


I ask this because someone posted a question on ruby-talk list, shows
perl's regex is much faster than ruby's.

[Quote]
#!/usr/bin/ruby
1000.times do

 File.open("index.html").each do |c|
   puts $1 if /href="http:\/\/(.*?)\/.*" target="_blank"/ =~ c
 end
end

time ./test.rb >/tmp/t
elap 6.511 user 6.336 syst 0.136 CPU 99.40%


#!/usr/bin/perl
for ($i=0; $i<1000; $i+=1) {

 open HD,"index.html" or die $!;
 while(<HD>) {
   print $1,"\n" if /href="http:\/\/(.*?)\/.*" target="_blank"/;
 }
 close HD;
}

time ./test.pl >/tmp/t
elap 0.864 user 0.844 syst 0.020 CPU 100.04%

So perl is 7 or 8 times faster here.
[/Quote]


But someone another optimized the ruby code and used ruby's built-in
scan method, which makes the regex run a lot faster.

[Quote]
I get best results in Ruby with:

 regexp = %r{href="http://([^"/]*)/[^"]*"\s+target="_blank"}
 1000.times do
  puts File.read('index.html').scan(regexp)
 end

~/ruby/bench time ruby19 regex.rb > /dev/null
real  0m1.428s
user  0m1.359s
sys  0m0.056s

~/ruby/bench time perl5.10.0 regex.pl > /dev/null
real  0m1.189s
user  0m1.095s
sys  0m0.084s

It's still slower. Perl has regular expression magic beyond my
imagination, though. I heard they take the most "rare" character in the
literal part of the regex (let's say, the colon) and search for it using
machine code, and then work their way backwards to the beginning of the
regexp...

Say what you want, but Perl rocks when it comes to text processing
speed.
[/Quote]


So I'm asking what's Perl's optimization for that regex.
I hope this doesn't disturb everyone, thanks.

Regards,
Jeff.

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

regex optimization

Reply via email to