Dan Sugalski: # I realize that benchmarking the RE engine's a pain, what with # no GC so we # leak until we blow memory and die, but... # # I'd like to take a series of regexes that exercise various bits of the # perl 5 engine and time them against the equivalent perl 6 RE # code. I think # it's important at this point to see where we stand relative to the # benchmark standard, to see whether this is even worth doing # from a speed # standpoint. (I really, really hope so, but...) # # When we've a reasonably comparable engine then I think it'll # be time to # write the text->opcode compiler so we can start turning # scalar variables # and suchlike things into regexes.
There's a big problem with that: Perl 5's regex compiler, pregcomp. Perl 5's REs will always appear faster because Perl 5 has an intelligent, optimizing regex compiler. For example, take the following simple regex: /a+bc+/ pregcomp will optimize that by searching for a 'b' and working outwards both ways from there. (Actually, it might search for 'abc' and work from there; I'm not really sure.) Without considering pregcomp's optimizations, that RE is pretty easy to write in Parrot: RE_0: #/a+bc+/ rx_minlength 3 branch $start $advance: rx_advance $fail $start: rx_literal P0, "a", $advance $a_loop: rx_literal P0, "a", $b rx_pushindex P0 branch $a_loop $a_back: rx_popindex P0, $advance $b: rx_literal P0, "bc", $a_back $c_loop: rx_literal P0, "c", $succeed branch $c_loop $succeed: rx_succeed P0 $fail: rx_fail P0 If you wanted that optimized like pregcomp would, however, you'd have to do something like: RE_0: #/a+bc+/ rx_minlength 3 branch $start $advance: rx_advance P0, $fail $start: rx_literal P0, "abc", $advance $a: rx_pushstart P0 rx_setbackwards P0 rx_skipchars P0, 3 $a_loop: rx_literal P0, "a", $after_a branch $a_loop $after_a: rx_info_getindex P0, I0 rx_info_setstart P0, I0 $c: rx_setforwards P0 rx_popindex P0, PANIC rx_skipchars P0, 3 $c_loop: rx_literal P0, "c", $succeed branch $c_loop $succeed: rx_succeed P0 $fail: rx_fail P0 PANIC: print "Impossible operation occurred in regex" end And that only takes one optimization into account--pregcomp has a big bag of tricks. Note that rx_setforwards, rx_setbackwards, rx_skipchars, rx_pushstart, rx_popstart, and rx_info_setstart don't currently exist, though they could probably be implemented fairly easily. See my point? Unless there's some way to write opcodes for Perl 5's REs (and we do all our benchmarking with that), we will probably never come close to their performance until we have a regex compiler close to their quality. --Brent Dax [EMAIL PROTECTED] Configure pumpking for Perl 6 "Nothing important happened today." --George III of England's diary entry for 4-Jul-1776