2010/1/5 Jeff Peng <jeffp...@netzero.net>: > Hello, > > Can the code (specially the regex) below be optimized to run faster? > > #!/usr/bin/perl > for ($i=0; $i<1000; $i+=1) { > > open HD,"index.html" or die $!; > while(<HD>) { > print $1,"\n" if /href="http:\/\/(.*?)\/.*" target="_blank"/; > } > close HD; > }
theoretically it can be optimized, but you'll have to get real down to internals and magic to get anything out of this, which is fairly basic. > > I ask this because someone posted a question on ruby-talk list, shows > perl's regex is much faster than ruby's. I have to stop here and mention that this benchmarking doesn't say anything specific about perl vs. ruby regex capabilities, you have here an http call to a page, file IO and then the regex parsing. A true comparison of the two regex engines will have to work on something that doesn't involve either. > > But someone another optimized the ruby code and used ruby's built-in > scan method, which makes the regex run a lot faster. > > [Quote] > I get best results in Ruby with: > > regexp = %r{href="http://([^"/]*)/[^"]*"\s+target="_blank"} > 1000.times do > puts File.read('index.html').scan(regexp) > end This is something that Perl (post version 5.6) does inherently, which is compiling a regex only once as long as the pattern isn't modified. Prior to 5.6 you'd need to use the /o modifier ( m/href="http:\/\/(.*?)\/.*" target="_blank"/o ). > It's still slower. Perl has regular expression magic beyond my > imagination, though. I heard they take the most "rare" character in the > literal part of the regex (let's say, the colon) and search for it using > machine code, and then work their way backwards to the beginning of the > regexp... Perl is fast, and is highly optimized for a lot of elements in the regex engine. A lot of it is based on the regex being a part of the language and not a module, which means the optimizations are in the language implementation itself. However, both perl and ruby use a recursive backtracking engine which gives you a lot of features on the expense of some speed that a more basic implementation like the unix grep one. -- Erez "The government forgets that George Orwell's 1984 was a warning, and not a blueprint" http://www.nonviolent-conflict.org/ -- http://www.whyweprotest.org/ -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/