Hello, Can the code (specially the regex) below be optimized to run faster?
#!/usr/bin/perl for ($i=0; $i<1000; $i+=1) { open HD,"index.html" or die $!; while(<HD>) { print $1,"\n" if /href="http:\/\/(.*?)\/.*" target="_blank"/; } close HD; } The "index.html" is got from: wget http://www.265.com/Kexue_Jishu/ I ask this because someone posted a question on ruby-talk list, shows perl's regex is much faster than ruby's. [Quote] #!/usr/bin/ruby 1000.times do File.open("index.html").each do |c| puts $1 if /href="http:\/\/(.*?)\/.*" target="_blank"/ =~ c end end time ./test.rb >/tmp/t elap 6.511 user 6.336 syst 0.136 CPU 99.40% #!/usr/bin/perl for ($i=0; $i<1000; $i+=1) { open HD,"index.html" or die $!; while(<HD>) { print $1,"\n" if /href="http:\/\/(.*?)\/.*" target="_blank"/; } close HD; } time ./test.pl >/tmp/t elap 0.864 user 0.844 syst 0.020 CPU 100.04% So perl is 7 or 8 times faster here. [/Quote] But someone another optimized the ruby code and used ruby's built-in scan method, which makes the regex run a lot faster. [Quote] I get best results in Ruby with: regexp = %r{href="http://([^"/]*)/[^"]*"\s+target="_blank"} 1000.times do puts File.read('index.html').scan(regexp) end ~/ruby/bench time ruby19 regex.rb > /dev/null real 0m1.428s user 0m1.359s sys 0m0.056s ~/ruby/bench time perl5.10.0 regex.pl > /dev/null real 0m1.189s user 0m1.095s sys 0m0.084s It's still slower. Perl has regular expression magic beyond my imagination, though. I heard they take the most "rare" character in the literal part of the regex (let's say, the colon) and search for it using machine code, and then work their way backwards to the beginning of the regexp... Say what you want, but Perl rocks when it comes to text processing speed. [/Quote] So I'm asking what's Perl's optimization for that regex. I hope this doesn't disturb everyone, thanks. Regards, Jeff. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/