[ 
https://issues.apache.org/jira/browse/CTAKES-158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Finan updated CTAKES-158:
------------------------------
    Priority: Minor  (was: Major)

> DateAnnotation bug when two dates directly adjacent
> ---------------------------------------------------
>
>                 Key: CTAKES-158
>                 URL: https://issues.apache.org/jira/browse/CTAKES-158
>             Project: cTAKES
>          Issue Type: Bug
>          Components: ctakes-context-tokenizer
>    Affects Versions: 3.0-incubating, 3.1.0
>            Reporter: James Joseph Masanz
>            Priority: Minor
>
> from email from Shady AbdelAziz February 11, 2013 on ctakes-dev@
>   While working with DateAnnotation and add some new state machines in the 
> DateFSM.java, i found a minor bug regarding the starting and ending index of 
> DateAnnotation.
> Consider the small example
> "October 2003 November 2010 cTAKES is the best framework".
> The result is supposed to be "October 2003" and "November 2010", but cTAKES 
> detects "October 2003" and "October 2003 November 2010".
> This is because the FSM detects the first one and as it has no record in the 
> "tokenStartMap" so it assumes the starting index as "0". Then it starts 
> detecting the second date but also there is no record for it in the map 
> yet(as there is a value in the map only when the state is a starting state, 
> in other words a condition that is not satisfying any state), so it assumes 
> the starting index is "0".
> Thats why for example if there is an intermediate token between the two 
> dates, it will work fine.
> The solution is simply to put a record in the map before resetting the FSM.
> so this line should be put "tokenStartMap.put(fsm, new Integer(i));".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to