[ https://issues.apache.org/jira/browse/CTAKES-16?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17650810#comment-17650810 ]
Richard Eckart de Castilho commented on CTAKES-16: -------------------------------------------------- Actually, use the `cas.select()` method of UIMAv3. > use uimaFIT's selectCovered() instead of UIMA's subiterator > ----------------------------------------------------------- > > Key: CTAKES-16 > URL: https://issues.apache.org/jira/browse/CTAKES-16 > Project: cTAKES > Issue Type: Improvement > Components: ctakes-assertion, ctakes-chunker, > ctakes-clinical-pipeline, ctakes-context-tokenizer, ctakes-core, > ctakes-dependency-parser, ctakes-ne-contexts, ctakes-pos-tagger > Reporter: Pei Chen > Priority: Minor > > Could not get consistent results from .subiterator when using uimaFIT with > the cTAKES GUI (which wires the components together dynamically). > To get all the BaseTokens for a particular sentence, if we use the > .subiterator, the types has be stored in the FSindexes in a certain order > otherwise it could just return an empty list. This would require the users > of annotators to understand the ordering of types and have it preconfigured. > FSIterator<Annotation> tokensInSentenceIterator = > jcas.getAnnotationIndex(BaseToken.type).subiterator(sentence); > uimaFIT already created a convenience method that seems to do something > similar which will always return the expected tokens. Does anyone know if > this was part of the motivation? Is the performance hit (if any) worth the > ease of use? > Ex: > List<BaseToken> tokens = org.uimafit.util.JCasUtil.selectCovered(jCas, > BaseToken.class, sentence); Another alternative is UIMA's FilteredIterator. > There are a few places that use subiterator in cTAKES and it's tempting to > use uimaFIT's JCasUtil.selecteCovered() instead... What do others think? > Background: This issue surfaced when we use the cTAKES GUI (which uses > uimaFIT to wire the components together instead of the Aggregate XML > descriptor). > --Pei > On Aug 9, 2012, at 9:18 AM, Chen, Pei wrote: > To get all the BaseTokens for a particular sentence, if we use the > .subiterator, > the types has be stored in the FSindexes in a certain order otherwise it could > just return an empty list. This would require the users of annotators to > understand the ordering of types and have it preconfigured. > FSIterator<Annotation> tokensInSentenceIterator = > jcas.getAnnotationIndex(BaseToken.type).subiterator(sentence); > uimaFIT already created a convenience method that seems to do something > similar > which will always return the expected tokens. Does anyone know if this was > part > of the motivation? > Yes, that was exactly the motivation to avoid using subiterators. Our > experience > in uimaFIT was that subiterators never did what you wanted them to do. > Is the performance hit (if any) worth the ease of use? > I doubt there's a performance hit. Take a look at the source for > JCasUtil.selectCovered vs. org.apache.uima.cas.impl.Subiterator. If anything, > selectCovered is probably doing less. > But of course you could time it and find out for sure. > Steve > Full discussion thread could be found here: > http://markmail.org/search/+list:org.apache.incubator.ctakes-dev#query:%20list%3Aorg.apache.incubator.ctakes-dev+page:1+mid:hcp3rudjelddo2dy+state:results -- This message was sent by Atlassian Jira (v8.20.10#820010)