+1 I think that a factory is a great idea.

I (personally) dislike the descriptor schema, but I think that deprecation is 
the way to go until a replacement comes along.  



-----Original Message-----
From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] 
Sent: Tuesday, April 15, 2014 9:54 AM
To: dev@ctakes.apache.org
Subject: suggestion for default pipelines

The discussion in the other thread with Abraham Tom gave me an idea I
wanted to float to the list. We have been using some UIMAFit pipeline
builders in the temporal project that maybe could be moved into
clinical-pipeline. For example, look to this file:

http://svn.apache.org/viewvc/ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/pipelines/TemporalExtractionPipeline_ImplBase.java?view=markup

with the static methods getPreprocessorAggregateBuilder() and
getLightweightPreprocessorAggregateBuilder()   [no umls].

So my idea would be to create a class in clinical-pipeline
(CTakesPipelines) with static methods for some standard pipelines (to
return AnalysisEngineDescriptions instead of AggregateBuilders?):

getStandardUMLSPipeline()  -- builds pipeline currently in
AggregatePlaintextUMLSProcessor.xml
getFullPipeline() -- same as above but with SRL, constituency parsing,
etc., every component in ctakes

We could then potentially merge our entry points -- I think Abraham's
experience points out that this is currently confusing, as well as
probably not implemented optimally. For example, either
ClinicalPipelineWithUmls or BagOfCUIsGenerator would use that static
method to run a uimafit-style pipeline. Maybe we can slowly deprecate
our xml descriptors too unless people feel strongly about keeping those
around.

Another benefit is that the cTAKES API is then trivial -- if you import
ctakes into your pom file getting a UIMA pipeline is one UimaFit call:

builder.add(CTAKESPipelines.getStandardUMLSPipeline());


I think this would actually be pretty easy to implement, but hoping to
get some feedback on whether this is a good direction.

Tim



Reply via email to