Re: instaparse questions

Mark Engelberg Mon, 18 Nov 2013 15:52:11 -0800

Also, the version in the tutorial called "preferential-tokenizer" behaves
the way you would like.  This is actually a really good illustration of the
difference between the two approaches of negative lookahead versus ordered
choice.


The unambiguous-tokenizer, by saying "<token> = keyword | !keyword
identifier", rigidly specifies that it's not a valid identifier if it
starts with a keyword.  The preferential-tokenizer simply says: "<token> =
keyword / identifier", i.e., keyword interpretation is *preferred* over
identifier.  The preference approach is more flexible, allowing the parser
to begin by interpreting the "cond" in "condid" as a keyword, but when this
doesn't lead to a valid parse (because there's no whitespace after "cond"),
it backtracks and tries interpreting it as an identifier.

As I pointed out in the last post, you can "fix" the unambiguous-tokenizer
by clearly specifying with regexes that the tokens must end at word
boundaries, but the preferential-tokenizer example is another way to get
the behavior you're looking for.

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: instaparse questions

Reply via email to