Thanks Mark, Here is my understanding and few more questions (please correct me if I'm missing something).
plus = plus <'+'> num | num is unambiguous left-associative. It is valid and is guaranteed not to cause infinite recursion. plus = num <'+'> plus | num is unambiguous right-associative. Also valid and is guaranteed not to cause infinite recursion. plus = plus <'+'> plus | num is ambiguous. It's allowed but the default parser choice is unpredictable by design. Is it even consistent across runs? (I'm guessing yes for now) And across future instaparse versions? (I'm guessing no) Is there a way to check if my parser is ambiguous without feeding it some input and then using insta/parses fn? Basically after calling insta/parser I'd like to learn if grammar is OK. Here is where my question is coming from: If I were to use such parser in production I'd like it to be unambiguous. And I'd like to detect ambiguity early, before my software ships/deployed. Preferably during build/packaging/deployment time. But since for Clojure projects all these things are somewhat fuzzy, at very least I'd like to detect ambiguity during my app startup. I.e. I'd like to put a big fat assert during initialization phase. Is there a way to do it now (or planned in the future)? Thanks for being patient with me :-), Dmitry. On Friday, April 12, 2013 12:51:36 AM UTC-7, puzzler wrote: > > On Thu, Apr 11, 2013 at 7:41 PM, Dmitry Kakurin > <dmitry....@gmail.com<javascript:> > > wrote: > >> But your parser rules are somewhat new to me. >> Both variations are accepted: >> >> add = add-sub <'+'> add-sub >> add = mul-div <'+'> add-sub >> >> And in both cases some generated parsers are correct (arithmetically >> speaking :-) ), but I'd like to understand rules for the first/default >> parser. >> Could you clarify it a little please? >> > > Let's start with a simpler example, a grammar that only involves + and > numbers (no handling for parens or whitespace). > > (def addition > (insta/parser > "plus = plus <'+'> plus | num > num = #'[0-9]'+")) > > If you call (parses addition "1+2+3"), you'll see that that two parse > results are returned. This grammar is ambiguous and you'll get one parse > result that corresponds to (1+2)+3 and another parse result that > corresponds to 1+(2+3). > > I gather from your questions that you really want to know which of these > parse results will come first (i.e., which one will be returned if you only > ask for one parse result). The answer is: I make no guarantee as to which > will come first. (I'm not just being ornery; this lack of guarantee is > important for my future plans for a concurrent version). > > Now in this particular case, since addition is associative, maybe we're > perfectly happy to let the ambiguity stand and let instaparse just pick one > of the two possible results for us. But if we don't want that ambiguity, > we must alter the grammar until we see that parses always returns one > result. In the above grammar, there are a couple straightforward ways to > resolve the disambiguation: > > (def addition-associate-right > (insta/parser > "plus = num <'+'> plus | num > num = #'[0-9]'+")) > > (def addition-associate-left > (insta/parser > "plus = plus <'+'> num | num > num = #'[0-9]'+")) > > You mentioned that you were familiar with LL grammars. So this is similar > to LL in the sense that if you want a precise result, you need to make the > grammar precise -- you can't rely on external precedence or associativity > rules. However, the resulting disambiguated grammars are much more natural. > > Consider the above addition-associate-left grammar -- LL would choke on > this because it is left-recursive. If you want to associate to the left in > LL, you have to transform the natural-looking left-recursive rule into > something much messier. This is an unfortunate limitation of LL grammars. > For correct calculation, arithmetic operators of equal precedence really > need to associate to the left (i.e., 1-2-3 should evaluate as (1-2)-3, not > 1-(2-3)), and left recursion is the easiest way to express that. > > To better understand this point, take a look at: > http://www.csse.monash.edu.au/~lloyd/tildeProgLang/Grammar/Top-Down/ > > First, it shows the arithmetic grammar you'd *want* to write. You'll see > it looks almost identical to the grammar I used in the tutorial (although I > added a root tag and broke out the operators individually for evaluation > purposes). > > Then, it goes on to show that in LL, you can't actually write the grammar > the way you'd want to write it. It demonstrates two transformations that > eventually turn the grammar into something LL can handle. > > The beauty of the tutorial example is that it shows you can get a > completely unambiguous grammar, just by writing it down in the natural way > that expresses the precedence and associativity constraints. Unlike LL, > the elegant way just works. > > So, to summarize the main points here: > > * As you develop your grammar, test your grammar with the parses function > to determine whether your grammar is ambiguous. > * If your grammar is ambiguous, but the ambiguity doesn't cause a problem, > then that's fine, just leave it ambiguous. > * If you need to disambiguate, you must do it by building more precision > into the grammar. > * Don't need to worry about limitations of LL, but the mindset of top-down > parsing is similar. > > I hope this helps! > > --Mark > -- -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.