subject:"Re\: instaparse questions"

Re: instaparse questions

2013-11-18 Thread Mark Engelberg

Also, the version in the tutorial called "preferential-tokenizer" behaves the way you would like. This is actually a really good illustration of the difference between the two approaches of negative lookahead versus ordered choice. The unambiguous-tokenizer, by saying " = keyword | !keyword ident

Re: instaparse questions

2013-11-18 Thread Mark Engelberg

Simplest way is to make the keywords regular expressions that look for a "word boundary" after the keyword: (def unambiguous-tokenizer-improved (insta/parser "sentence = token ( token)* = keyword | !keyword identifier whitespace = #'\\s+' identifier = #'[a-zA-Z]+' keywor