I am not familiar with instaparse, but the parser may be reasoning as follows:
- "Exposure" matches a "PK" TOKEN that is preferred over "WORD" TOKEN, so it is parsed as "PK" TOKEN - "to" matches a "WORD" TOKEN that is not preferred, but there's no other choice, so it is parsed as a "WORD" TOKEN - ... I don't think that instaparse will consider aggregating more complex TOKENS when it can parse each word as a TOKEN (at worst a "WORD" token if none of the preferred one-word TOKENs match) I'd say you have to rework your grammar to be less ambiguous (i.e., only one way to decompose a sentence) but unfortunately I don't know how exactly you should fix it for instaparse HTH -Gianluca On Monday, November 18, 2013 2:47:22 PM UTC+1, Jim foo.bar wrote: > > Hi all, > > I'm having a small problem composing smaller matches in instaparse. Here > is what I'm trying...just observe the bold bits: > > (def parsePK > (insta/parser > "S = TOKEN (SPACE TOKEN PUNCT?)* END > TOKEN = (NUM | DRUG | PK | DRUGPK | MECH | SIGN | EFF | ENCLOSED) / > WORD > <WORD> = #'\\w+' | PUNCT > <PUNCT> = #'\\p{Punct}' > ENCLOSED = PAREN | SQBR > <PAREN> = #'\\[.*\\]' > <SQBR> = #'\\(.*\\)' > NUM = #'[0-9]+' > ADV = #'[a-z]+ly' > <SPACE> = #'\\s+' > DRUG = #'(?i)didanosine|quinidine|tenofovir' > PK = #'(?i)exposure|bioavailability|lower?[\\s|\\-]?clearance' > *DRUGPK = PK SPACE TO SPACE DRUG SPACE EFF? SPACE * > MECH = #'[a-z]+e(s|d)' > *EFF = BE? SPACE SIGN? SPACE MECH | BE? SPACE MECH SPACE ADV? * > SIGN = ADV | NEG > NEG = 'not' > <TO> = 'to' | 'of' > <BE> = 'is' | 'are' | 'was' | 'were' > END = '.' " )) > > Running the parser returns the following. It seems that the 2 bigger > composite rules DRUGPK & EFF are not recognised at all. Only the smaller > pieces are actually shown. I would expect [:TOKEN [:DRUGPK "Exposure to > didanosine is increased"]] and [:TOKEN [:EFF "is increased"]] entries. > (pprint > (parsePK "Exposure to didanosine is increased when coadministered with > tenofovir disoproxil fumarate [Table 5 and see Clinical Pharmacokinetics > (12.3, Tables 9 and 10)].")) > > > [:S > [:TOKEN [:PK "Exposure"]] > " " > [:TOKEN "to"] > " " > [:TOKEN [:DRUG "didanosine"]] > " " > [:TOKEN "is"] > " " > [:TOKEN [:MECH "increased"]] > " " > [:TOKEN "when"] > " " > [:TOKEN [:MECH "coadministered"]] > " " > [:TOKEN "with"] > " " > [:TOKEN [:DRUG "tenofovir"]] > "," > " " > [:TOKEN "disoproxil"] > " " > [:TOKEN "fumarate"] > [:END "."]] > > Am I thinking about it the wrong way? Can ayone shed some light? > > many thanks in advance, > > Jim > > > > > > -- -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.