Dan Sugalski:
# I realize that benchmarking the RE engine's a pain, what with
# no GC so we
# leak until we blow memory and die, but...
#
# I'd like to take a series of regexes that exercise various bits of the
# perl 5 engine and time them against the equivalent perl 6 RE
# code. I think
# it's important at this point to see where we stand relative to the
# benchmark standard, to see whether this is even worth doing
# from a speed
# standpoint. (I really, really hope so, but...)
#
# When we've a reasonably comparable engine then I think it'll
# be time to
# write the text->opcode compiler so we can start turning
# scalar variables
# and suchlike things into regexes.

There's a big problem with that: Perl 5's regex compiler, pregcomp.

Perl 5's REs will always appear faster because Perl 5 has an
intelligent, optimizing regex compiler.  For example, take the following
simple regex:

        /a+bc+/

pregcomp will optimize that by searching for a 'b' and working outwards
both ways from there.  (Actually, it might search for 'abc' and work
from there; I'm not really sure.)  Without considering pregcomp's
optimizations, that RE is pretty easy to write in Parrot:

RE_0:
        #/a+bc+/
        rx_minlength 3
        branch $start
$advance:
        rx_advance $fail
$start:
        rx_literal P0, "a", $advance
$a_loop:
        rx_literal P0, "a", $b
        rx_pushindex P0
        branch $a_loop
$a_back:
        rx_popindex P0, $advance
$b:
        rx_literal P0, "bc", $a_back
$c_loop:
        rx_literal P0, "c", $succeed
        branch $c_loop
$succeed:
        rx_succeed P0
$fail:
        rx_fail P0

If you wanted that optimized like pregcomp would, however, you'd have to
do something like:

RE_0:
        #/a+bc+/
        rx_minlength 3
        branch $start
$advance:
        rx_advance P0, $fail
$start:
        rx_literal P0, "abc", $advance
$a:
        rx_pushstart P0
        rx_setbackwards P0
        rx_skipchars P0, 3
$a_loop:
        rx_literal P0, "a", $after_a
        branch $a_loop
$after_a:
        rx_info_getindex P0, I0
        rx_info_setstart P0, I0
$c:
        rx_setforwards P0
        rx_popindex P0, PANIC
        rx_skipchars P0, 3
$c_loop:
        rx_literal P0, "c", $succeed
        branch $c_loop
$succeed:
        rx_succeed P0
$fail:
        rx_fail P0

PANIC:
        print "Impossible operation occurred in regex"
        end

And that only takes one optimization into account--pregcomp has a big
bag of tricks.  Note that rx_setforwards, rx_setbackwards, rx_skipchars,
rx_pushstart, rx_popstart, and rx_info_setstart don't currently exist,
though they could probably be implemented fairly easily.

See my point?  Unless there's some way to write opcodes for Perl 5's REs
(and we do all our benchmarking with that), we will probably never come
close to their performance until we have a regex compiler close to their
quality.

--Brent Dax
[EMAIL PROTECTED]
Configure pumpking for Perl 6

"Nothing important happened today."
    --George III of England's diary entry for 4-Jul-1776

Reply via email to