On Tue, Jul 11, 2006 at 02:41:19PM -0600, Kevin Tew wrote:
> It parses my simple puts.rb example, but parse time is really slow..  2 
> minutes.
> I'm sure I've made some dumb grammar mistakes that is slowing it down.

Well, the first thing to note is that subrule calls can be comparatively
slow, so I think you might get a huge improvement by eliminating
the <sp> subrule from 

    token ws {[<sp>|<[\t]>]*}

resulting in

    token ws { <[ \t]>* }

(Also, <sp> is a capturing subrule, so that means a separate Match 
object is being created and stored for every space encountered
in the source program.  In such cases <?sp> might be better.)

Along a similar vein, I think that a rule such as

    rule statement {
          <ALIAS> <fitem> <fitem>
          |<ALIAS> <global_variable> [<global_variable>|<back_reference>]
          |<UNDEF> <undef_list>
          |<statement2> [<IF> |<UNLESS> |<WHILE> |<UNTIL>] <expression_value>
          |<statement2> <RESCUE> <statement>
          |<BEGIN> \{ <compound_statement> \}
          |<END> \{ <compound_statement> \}
          |<command_call>
          |<statement2>
    }

may be quite a bit slower than the more direct

    rule statement {
          alias <fitem> <fitem>
          |alias <global_variable> [<global_variable>|<back_reference>]
          |undef <undef_list>
          |<statement2> [if|unless|while|until] <expression_value>
          |<statement2> rescue <statement>
          |begin \{ <compound_statement> \}
          |end \{ <compound_statement> \}
          |<command_call>
          |<statement2>
    }

but I haven't tested this at all to know if the difference
in speed is significant.  I do know that the regex engine will
have more optimization possibilities with the second form than
with the first.  (If one stylistically prefers the keyword tokens 
not appear as "barewords" in the rule, then <'alias'>, <'undef'>,
etc. work equally well for constant literals.)

It's also probably worthwhile to avoid backtracking and re-parsing 
complex subrules such as <statement2> above.  In the above, a plain
<statement2> w/o if/unless/while/until/rescue ends up being parsed
three separate times before the rule succeeds.  Better might be:

    rule statement {
          |alias <fitem> <fitem>
          |alias <global_variable> [<global_variable>|<back_reference>]
          |undef <undef_list>
          |begin \{ <compound_statement> \}
          |end \{ <compound_statement> \}
          |<statement2> [ [if|unless|while|until] <expression_value>
                        | rescue <statement> 
                        ]?
          |<command_call>
    }

(In fact, looking at the grammar I'm not sure that <command_call>
is really needed, since <statement2> already covers that.  But I'm
not a Ruby expert.)

Anyway, let me know if any of the above suggestions make sense
or provide any form of improvement in parsing speed.

Thanks!

Pm

> Patrick R. Michaud wrote:
> >On Fri, Jul 07, 2006 at 10:07:57AM -0600, Kevin Tew wrote:
> >  
> >>I based the initial PGE grammar for PRuby off of  
> >>svn://rubyforge.org/var/svn/rubygrammar/grammars/antlr-v3/trunk/ruby.g 
> >>which is in complete.
> >>I'm looking for a BNF style description of the Ruby grammar.  Otherwise 
> >>I will have to dig into :pserver:[EMAIL PROTECTED]:/src/parse.y.
> >>    
> >
> >I'll be glad to provide any help that I can in building a PGE
> >version of the grammar -- just let me know where I can help.
> >
> >Pm
> >  

Reply via email to