[jira] [Reopened] (CTAKES-449) PolarityCleartkAnalysisEngine slow for large documents

Dmitriy Dligach (JIRA) Fri, 27 Jul 2018 11:46:13 -0700


     [ 
https://issues.apache.org/jira/browse/CTAKES-449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Dmitriy Dligach reopened CTAKES-449:
------------------------------------

We tested this on a couple of relatively large note files and it took about a 
day to process two files: 1M and 6M in size. So, I suspect the problem is still 
there despite the improvements that Sean made.

> PolarityCleartkAnalysisEngine slow for large documents
> ------------------------------------------------------
>
>                 Key: CTAKES-449
>                 URL: https://issues.apache.org/jira/browse/CTAKES-449
>             Project: cTAKES
>          Issue Type: Improvement
>          Components: ctakes-assertion
>            Reporter: Dmitriy Dligach
>            Assignee: Sean Finan
>            Priority: Major
>             Fix For: 4.0.1
>
>
> As soon as I add at the end of my pipeline the negation AE:
> aggregateBuilder.add( 
> PolarityCleartkAnalysisEngine.createAnnotatorDescription() );
> The pipeline becomes 50-100 times slower. This likely has to do with the line:
> List<Sentence> sents = new ArrayList<>(JCasUtil.selectCovering(jCas, 
> Sentence.class, entityOrEventMention.getBegin(), 
> entityOrEventMention.getEnd()));
> in AssertionCleartkAnalysisEngine. I am running the pipeline on large files 
> (i.e. having a large number of sentences). The slowdown is caused by the 
> code's obtaining all sentences in a document for each identified annotation.
> The full pipeline is here:
> https://github.com/dmitriydligach/ctakes-misc/blob/master/src/main/java/org/apache/ctakes/pipelines/UmlsLookupPipeline.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Reopened] (CTAKES-449) PolarityCleartkAnalysisEngine slow for large documents

Reply via email to