There's a marked improvement in speed. The one benchmark file that took 13 minutes, now gets in under 8. I haven't tried the full data yet, which is a file five times larger.
This is my first real attempt at anything to do with perl6 rules. I'm just learning as I go, and using synopsis 5 for reference.
regexdna.pir
Description: Binary data
On Dec 16, 2005, at 11:35 PM, Patrick R. Michaud wrote:
I don't know all of the details and restrictions of the benchmark,and I'll be the first to claim that PGE can be slow at times (it has veryfew optimizations built-in). But we may have a few tricks available to try. First, note that <gt> is a subrule and subrules involve extra subroutine call overhead (with a lot of setup and take-down). Using C<< \> >> should be much much much faster, as it's a simple string comparison. Instead of repeatedly calling the pattern via "next", I'd just use an quantified capture and get all of the things to be stripped all at once. Thus perhaps something like: pattern = '[ ( [ \> \N*: ] \n ) | \N*: (\n) ]*' rulesub = p6rule_compile(pattern) match = rulesub(seq) This gives us a single match object, with match[0] as an array of the captured portions. We can then just walk through the captured portions (in reverse order) and remove the substrings-- something like: .local pmc capt capt = match[0] # capt is an array of Match stripfind: unless capt goto endstripfind $P0 = pop capt # remove last capture $I0 = $P0."from"() # get starting pos $I1 = $P0."to"() # get ending pos $I1 -= $I0 # convert to length substr seq, $I0, $I1, '' # remove unwanted portion goto stripfind endstripfind: Hope this helps at least a little bit. It's still likely to be somewhat slow. We may also be able to get some improvements by implementing the :g modifier for the repeated captures, and being able to compile (or use) whole substitutions as opposed to just rules. Pm