[ https://issues.apache.org/jira/browse/CTAKES-158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Finan updated CTAKES-158: ------------------------------ Priority: Minor (was: Major) > DateAnnotation bug when two dates directly adjacent > --------------------------------------------------- > > Key: CTAKES-158 > URL: https://issues.apache.org/jira/browse/CTAKES-158 > Project: cTAKES > Issue Type: Bug > Components: ctakes-context-tokenizer > Affects Versions: 3.0-incubating, 3.1.0 > Reporter: James Joseph Masanz > Priority: Minor > > from email from Shady AbdelAziz February 11, 2013 on ctakes-dev@ > While working with DateAnnotation and add some new state machines in the > DateFSM.java, i found a minor bug regarding the starting and ending index of > DateAnnotation. > Consider the small example > "October 2003 November 2010 cTAKES is the best framework". > The result is supposed to be "October 2003" and "November 2010", but cTAKES > detects "October 2003" and "October 2003 November 2010". > This is because the FSM detects the first one and as it has no record in the > "tokenStartMap" so it assumes the starting index as "0". Then it starts > detecting the second date but also there is no record for it in the map > yet(as there is a value in the map only when the state is a starting state, > in other words a condition that is not satisfying any state), so it assumes > the starting index is "0". > Thats why for example if there is an intermediate token between the two > dates, it will work fine. > The solution is simply to put a record in the map before resetting the FSM. > so this line should be put "tokenStartMap.put(fsm, new Integer(i));". -- This message was sent by Atlassian Jira (v8.20.10#820010)