Re: [ANN] Instaparse 1.0.0

Mark Engelberg Fri, 12 Apr 2013 00:51:40 -0700

On Thu, Apr 11, 2013 at 7:41 PM, Dmitry Kakurin <dmitry.kaku...@gmail.com>wrote:


> But your parser rules are somewhat new to me.
> Both variations are accepted:
>
>      add = add-sub <'+'> add-sub
>      add = mul-div <'+'> add-sub
>
> And in both cases some generated parsers are correct (arithmetically
> speaking :-) ), but I'd like to understand rules for the first/default
> parser.
> Could you clarify it a little please?
>

Let's start with a simpler example, a grammar that only involves + and
numbers (no handling for parens or whitespace).

(def addition
  (insta/parser
    "plus = plus <'+'> plus | num
     num = #'[0-9]'+"))

If you call (parses addition "1+2+3"), you'll see that that two parse
results are returned.  This grammar is ambiguous and you'll get one parse
result that corresponds to (1+2)+3 and another parse result that
corresponds to 1+(2+3).

I gather from your questions that you really want to know which of these
parse results will come first (i.e., which one will be returned if you only
ask for one parse result).  The answer is:  I make no guarantee as to which
will come first.  (I'm not just being ornery; this lack of guarantee is
important for my future plans for a concurrent version).

Now in this particular case, since addition is associative, maybe we're
perfectly happy to let the ambiguity stand and let instaparse just pick one
of the two possible results for us.  But if we don't want that ambiguity,
we must alter the grammar until we see that parses always returns one
result.  In the above grammar, there are a couple straightforward ways to
resolve the disambiguation:

(def addition-associate-right
  (insta/parser
    "plus = num <'+'> plus | num
     num = #'[0-9]'+"))

(def addition-associate-left
  (insta/parser
    "plus = plus <'+'> num | num
     num = #'[0-9]'+"))

You mentioned that you were familiar with LL grammars.  So this is similar
to LL in the sense that if you want a precise result, you need to make the
grammar precise -- you can't rely on external precedence or associativity
rules.  However, the resulting disambiguated grammars are much more natural.

Consider the above addition-associate-left grammar -- LL would choke on
this because it is left-recursive.  If you want to associate to the left in
LL, you have to transform the natural-looking left-recursive rule into
something much messier.  This is an unfortunate limitation of LL grammars.
For correct calculation, arithmetic operators of equal precedence really
need to associate to the left (i.e., 1-2-3 should evaluate as (1-2)-3, not
1-(2-3)), and left recursion is the easiest way to express that.

To better understand this point, take a look at:
http://www.csse.monash.edu.au/~lloyd/tildeProgLang/Grammar/Top-Down/

First, it shows the arithmetic grammar you'd *want* to write.  You'll see
it looks almost identical to the grammar I used in the tutorial (although I
added a root tag and broke out the operators individually for evaluation
purposes).

Then, it goes on to show that in LL, you can't actually write the grammar
the way you'd want to write it.  It demonstrates two transformations that
eventually turn the grammar into something LL can handle.

The beauty of the tutorial example is that it shows you can get a
completely unambiguous grammar, just by writing it down in the natural way
that expresses the precedence and associativity constraints.  Unlike LL,
the elegant way just works.

So, to summarize the main points here:

* As you develop your grammar, test your grammar with the parses function
to determine whether your grammar is ambiguous.
* If your grammar is ambiguous, but the ambiguity doesn't cause a problem,
then that's fine, just leave it ambiguous.
* If you need to disambiguate, you must do it by building more precision
into the grammar.
* Don't need to worry about limitations of LL, but the mindset of top-down
parsing is similar.

I hope this helps!

--Mark

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [ANN] Instaparse 1.0.0

Reply via email to