On Tue, Jul 11, 2006 at 02:41:19PM -0600, Kevin Tew wrote: > It parses my simple puts.rb example, but parse time is really slow.. 2 > minutes. > I'm sure I've made some dumb grammar mistakes that is slowing it down.
Well, the first thing to note is that subrule calls can be comparatively slow, so I think you might get a huge improvement by eliminating the <sp> subrule from token ws {[<sp>|<[\t]>]*} resulting in token ws { <[ \t]>* } (Also, <sp> is a capturing subrule, so that means a separate Match object is being created and stored for every space encountered in the source program. In such cases <?sp> might be better.) Along a similar vein, I think that a rule such as rule statement { <ALIAS> <fitem> <fitem> |<ALIAS> <global_variable> [<global_variable>|<back_reference>] |<UNDEF> <undef_list> |<statement2> [<IF> |<UNLESS> |<WHILE> |<UNTIL>] <expression_value> |<statement2> <RESCUE> <statement> |<BEGIN> \{ <compound_statement> \} |<END> \{ <compound_statement> \} |<command_call> |<statement2> } may be quite a bit slower than the more direct rule statement { alias <fitem> <fitem> |alias <global_variable> [<global_variable>|<back_reference>] |undef <undef_list> |<statement2> [if|unless|while|until] <expression_value> |<statement2> rescue <statement> |begin \{ <compound_statement> \} |end \{ <compound_statement> \} |<command_call> |<statement2> } but I haven't tested this at all to know if the difference in speed is significant. I do know that the regex engine will have more optimization possibilities with the second form than with the first. (If one stylistically prefers the keyword tokens not appear as "barewords" in the rule, then <'alias'>, <'undef'>, etc. work equally well for constant literals.) It's also probably worthwhile to avoid backtracking and re-parsing complex subrules such as <statement2> above. In the above, a plain <statement2> w/o if/unless/while/until/rescue ends up being parsed three separate times before the rule succeeds. Better might be: rule statement { |alias <fitem> <fitem> |alias <global_variable> [<global_variable>|<back_reference>] |undef <undef_list> |begin \{ <compound_statement> \} |end \{ <compound_statement> \} |<statement2> [ [if|unless|while|until] <expression_value> | rescue <statement> ]? |<command_call> } (In fact, looking at the grammar I'm not sure that <command_call> is really needed, since <statement2> already covers that. But I'm not a Ruby expert.) Anyway, let me know if any of the above suggestions make sense or provide any form of improvement in parsing speed. Thanks! Pm > Patrick R. Michaud wrote: > >On Fri, Jul 07, 2006 at 10:07:57AM -0600, Kevin Tew wrote: > > > >>I based the initial PGE grammar for PRuby off of > >>svn://rubyforge.org/var/svn/rubygrammar/grammars/antlr-v3/trunk/ruby.g > >>which is in complete. > >>I'm looking for a BNF style description of the Ruby grammar. Otherwise > >>I will have to dig into :pserver:[EMAIL PROTECTED]:/src/parse.y. > >> > > > >I'll be glad to provide any help that I can in building a PGE > >version of the grammar -- just let me know where I can help. > > > >Pm > >