[ 
https://issues.apache.org/jira/browse/CTAKES-16?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17650810#comment-17650810
 ] 

Richard Eckart de Castilho commented on CTAKES-16:
--------------------------------------------------

Actually, use the `cas.select()` method of UIMAv3.

> use uimaFIT's selectCovered() instead of UIMA's subiterator
> -----------------------------------------------------------
>
>                 Key: CTAKES-16
>                 URL: https://issues.apache.org/jira/browse/CTAKES-16
>             Project: cTAKES
>          Issue Type: Improvement
>          Components: ctakes-assertion, ctakes-chunker, 
> ctakes-clinical-pipeline, ctakes-context-tokenizer, ctakes-core, 
> ctakes-dependency-parser, ctakes-ne-contexts, ctakes-pos-tagger
>            Reporter: Pei Chen
>            Priority: Minor
>
> Could not get consistent results from .subiterator when using uimaFIT with 
> the cTAKES GUI (which wires the components together dynamically).
> To get all the BaseTokens for a particular sentence, if we use the 
> .subiterator, the types has be stored in the FSindexes in a certain order 
> otherwise it could just return an empty list.  This would require the users 
> of annotators to understand the ordering of types and have it preconfigured.
> FSIterator<Annotation> tokensInSentenceIterator = 
> jcas.getAnnotationIndex(BaseToken.type).subiterator(sentence);
> uimaFIT already created a convenience method that seems to do something 
> similar which will always return the expected tokens.  Does anyone know if 
> this was part of the motivation?  Is the performance hit (if any) worth the 
> ease of use?
> Ex:
> List<BaseToken> tokens = org.uimafit.util.JCasUtil.selectCovered(jCas, 
> BaseToken.class, sentence); Another alternative is UIMA's FilteredIterator.
> There are a few places that use subiterator in cTAKES and it's tempting to 
> use uimaFIT's JCasUtil.selecteCovered() instead... What do others think?
> Background: This issue surfaced when we use the cTAKES GUI (which uses 
> uimaFIT to wire the components together instead of the Aggregate XML 
> descriptor).
> --Pei
> On Aug 9, 2012, at 9:18 AM, Chen, Pei wrote:
> To get all the BaseTokens for a particular sentence, if we use the 
> .subiterator,
> the types has be stored in the FSindexes in a certain order otherwise it could
> just return an empty list.  This would require the users of annotators to
> understand the ordering of types and have it preconfigured.
> FSIterator<Annotation> tokensInSentenceIterator =
> jcas.getAnnotationIndex(BaseToken.type).subiterator(sentence);
> uimaFIT already created a convenience method that seems to do something 
> similar
> which will always return the expected tokens.  Does anyone know if this was 
> part
> of the motivation?
> Yes, that was exactly the motivation to avoid using subiterators. Our 
> experience
> in uimaFIT was that subiterators never did what you wanted them to do.
> Is the performance hit (if any) worth the ease of use?
> I doubt there's a performance hit. Take a look at the source for
> JCasUtil.selectCovered vs. org.apache.uima.cas.impl.Subiterator. If anything,
> selectCovered is probably doing less.
> But of course you could time it and find out for sure.
> Steve
> Full discussion thread could be found here: 
> http://markmail.org/search/+list:org.apache.incubator.ctakes-dev#query:%20list%3Aorg.apache.incubator.ctakes-dev+page:1+mid:hcp3rudjelddo2dy+state:results



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to