[ https://issues.apache.org/jira/browse/CTAKES-449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dmitriy Dligach reopened CTAKES-449: ------------------------------------ We tested this on a couple of relatively large note files and it took about a day to process two files: 1M and 6M in size. So, I suspect the problem is still there despite the improvements that Sean made. > PolarityCleartkAnalysisEngine slow for large documents > ------------------------------------------------------ > > Key: CTAKES-449 > URL: https://issues.apache.org/jira/browse/CTAKES-449 > Project: cTAKES > Issue Type: Improvement > Components: ctakes-assertion > Reporter: Dmitriy Dligach > Assignee: Sean Finan > Priority: Major > Fix For: 4.0.1 > > > As soon as I add at the end of my pipeline the negation AE: > aggregateBuilder.add( > PolarityCleartkAnalysisEngine.createAnnotatorDescription() ); > The pipeline becomes 50-100 times slower. This likely has to do with the line: > List<Sentence> sents = new ArrayList<>(JCasUtil.selectCovering(jCas, > Sentence.class, entityOrEventMention.getBegin(), > entityOrEventMention.getEnd())); > in AssertionCleartkAnalysisEngine. I am running the pipeline on large files > (i.e. having a large number of sentences). The slowdown is caused by the > code's obtaining all sentences in a document for each identified annotation. > The full pipeline is here: > https://github.com/dmitriydligach/ctakes-misc/blob/master/src/main/java/org/apache/ctakes/pipelines/UmlsLookupPipeline.java -- This message was sent by Atlassian JIRA (v7.6.3#76005)