2010/1/5 Jeff Peng <jeffp...@netzero.net>:
> Hello,
>
> Can the code (specially the regex) below be optimized to run faster?
>
> #!/usr/bin/perl
> for ($i=0; $i<1000; $i+=1) {
>
>  open HD,"index.html" or die $!;
>  while(<HD>) {
>   print $1,"\n" if /href="http:\/\/(.*?)\/.*" target="_blank"/;
>  }
>  close HD;
> }

theoretically it can be optimized, but you'll have to get real down to
internals and magic to get anything out of this, which is fairly
basic.

>
> I ask this because someone posted a question on ruby-talk list, shows
> perl's regex is much faster than ruby's.

I have to stop here and mention that this benchmarking doesn't say
anything specific about perl vs. ruby regex capabilities, you have
here an http call to a page, file IO and then the regex parsing. A
true comparison of the two regex engines will have to work on
something that doesn't involve either.

>
> But someone another optimized the ruby code and used ruby's built-in
> scan method, which makes the regex run a lot faster.
>
> [Quote]
> I get best results in Ruby with:
>
>  regexp = %r{href="http://([^"/]*)/[^"]*"\s+target="_blank"}
>  1000.times do
>  puts File.read('index.html').scan(regexp)
>  end

This is something that Perl  (post version 5.6) does inherently, which
is compiling a regex only once as long as the pattern isn't modified.
Prior to 5.6 you'd need to use the /o modifier (
m/href="http:\/\/(.*?)\/.*" target="_blank"/o ).

> It's still slower. Perl has regular expression magic beyond my
> imagination, though. I heard they take the most "rare" character in the
> literal part of the regex (let's say, the colon) and search for it using
> machine code, and then work their way backwards to the beginning of the
> regexp...

Perl is fast, and is highly optimized for a lot of elements in the
regex engine. A lot of it is based on the regex being a part of the
language and not a module, which means the optimizations are in the
language implementation itself. However, both perl and ruby use a
recursive backtracking engine which gives you a lot of features on the
expense of some speed that a more basic implementation like the unix
grep one.

-- 
Erez

"The government forgets that George Orwell's 1984 was a warning, and
not a blueprint"
http://www.nonviolent-conflict.org/ -- http://www.whyweprotest.org/

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to