I am not familiar with instaparse, but the parser may be reasoning as 
follows:

- "Exposure" matches a "PK" TOKEN that is preferred over "WORD" TOKEN, so 
it is parsed as "PK" TOKEN
- "to" matches a "WORD" TOKEN that is not preferred, but there's no other 
choice, so it is parsed as a "WORD" TOKEN
- ...

I don't think that instaparse will consider aggregating more complex TOKENS 
when it can parse each word as a TOKEN (at worst a "WORD" token if none of 
the preferred one-word TOKENs match)

I'd say you have to rework your grammar to be less ambiguous (i.e., only 
one way to decompose a sentence) but unfortunately I don't know how exactly 
you should fix it for instaparse

HTH
-Gianluca

On Monday, November 18, 2013 2:47:22 PM UTC+1, Jim foo.bar wrote:
>
>  Hi all,
>
> I'm having a small problem composing smaller matches in instaparse. Here 
> is what I'm trying...just observe the bold bits:
>
> (def parsePK
>   (insta/parser
>    "S  = TOKEN (SPACE TOKEN PUNCT?)* END
>    TOKEN = (NUM | DRUG | PK | DRUGPK | MECH | SIGN | EFF | ENCLOSED) / 
> WORD 
>    <WORD> = #'\\w+' | PUNCT 
>    <PUNCT> = #'\\p{Punct}'
>    ENCLOSED = PAREN | SQBR
>    <PAREN> = #'\\[.*\\]'
>    <SQBR> =  #'\\(.*\\)'
>     NUM =  #'[0-9]+'
>     ADV =   #'[a-z]+ly'
>    <SPACE> = #'\\s+'
>     DRUG =  #'(?i)didanosine|quinidine|tenofovir'
>     PK =    #'(?i)exposure|bioavailability|lower?[\\s|\\-]?clearance'
>     *DRUGPK =  PK SPACE TO SPACE DRUG SPACE EFF? SPACE *
>     MECH =  #'[a-z]+e(s|d)'
>     *EFF = BE? SPACE SIGN? SPACE MECH | BE? SPACE MECH SPACE ADV? *
>     SIGN =  ADV | NEG
>     NEG = 'not'
>     <TO> = 'to' | 'of'
>     <BE> = 'is' | 'are' | 'was' | 'were'
>     END =  '.' " ))
>
> Running the parser returns the following. It seems that the 2 bigger 
> composite rules DRUGPK & EFF are not recognised at all. Only the smaller 
> pieces are actually shown. I would expect [:TOKEN [:DRUGPK "Exposure to 
> didanosine is increased"]] and  [:TOKEN [:EFF "is increased"]] entries.
> (pprint   
> (parsePK "Exposure to didanosine is increased when coadministered with 
> tenofovir disoproxil fumarate [Table 5 and see Clinical Pharmacokinetics 
> (12.3, Tables 9 and 10)].")) 
>  
>
> [:S
>  [:TOKEN [:PK "Exposure"]]
>  " "
>  [:TOKEN "to"]
>  " "
>  [:TOKEN [:DRUG "didanosine"]]
>  " "
>  [:TOKEN "is"]
>  " "
>  [:TOKEN [:MECH "increased"]]
>  " "
>  [:TOKEN "when"]
>  " "
>  [:TOKEN [:MECH "coadministered"]]
>  " "
>  [:TOKEN "with"]
>  " "
>  [:TOKEN [:DRUG "tenofovir"]]
>  ","
>  " "
>  [:TOKEN "disoproxil"]
>  " "
>  [:TOKEN "fumarate"]
>  [:END "."]]
>
>  Am I thinking about it the wrong way? Can ayone shed some light? 
>
> many thanks in advance,
>
> Jim
>
>
>
>
>
>  

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to