RE: Command line invocation

2014-04-15 Thread Miller, Timothy
It does look like that class has hardcoded paths. There may be another pipeline 
that is appropriate -- what kind of output are you looking to get from your 
documents?

Tim


From: Abraham Tom [a...@practicefusion.com]
Sent: Tuesday, April 15, 2014 12:08 AM
To: dev@ctakes.apache.org
Subject: RE: Command line invocation

Update
java -cp $CTAKES_HOME/lib/*:$CTAKES_HOME/desc/:$CTAKES_HOME/resources/ 
-Dlog4j.configuration=file:$CTAKES_HOME/config/log4j.xml -Xms512M -Xmx1024M 
org.apache.ctakes.clinicalpipeline.ClinicalPipelineWithUmls 
/opt/cTAKES-3.1.1/data/test01.txt /opt/cTAKES-3.1.1/data_out/

Exception in thread "main" 
org.apache.uima.resource.ResourceInitializationException
at 
org.cleartk.util.cr.FilesCollectionReader.initialize(FilesCollectionReader.java:251)
at 
org.uimafit.component.JCasCollectionReader_ImplBase.initialize(JCasCollectionReader_ImplBase.java:57)
at 
org.apache.uima.collection.CollectionReader_ImplBase.initialize(CollectionReader_ImplBase.java:71)
at 
org.apache.uima.impl.CollectionReaderFactory_impl.produceResource(CollectionReaderFactory_impl.java:103)
at 
org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)
at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:269)
at 
org.apache.uima.UIMAFramework.produceCollectionReader(UIMAFramework.java:711)
at 
org.uimafit.factory.CollectionReaderFactory.createCollectionReader(CollectionReaderFactory.java:171)
at 
org.cleartk.util.cr.FilesCollectionReader.getCollectionReader(FilesCollectionReader.java:88)
at 
org.apache.ctakes.clinicalpipeline.ClinicalPipelineWithUmls.main(ClinicalPipelineWithUmls.java:56)
Caused by: java.io.IOException: file or directory 
/sharp-home/assertion/data/ActiveLearning/plaintext does not exist
... 10 more
file or directory /sharp-home/assertion/data/ActiveLearning/plaintext

does this mean I cannot pass a directory and file in and I have to use the 
expected directory?

Best regards,

Abraham Tom

Abraham Tom

-Original Message-
From: Masanz, James J. [mailto:masanz.ja...@mayo.edu]
Sent: Sunday, April 13, 2014 5:32 AM
To: 'dev@ctakes.apache.org'
Subject: RE: Command line invocation


Yes, that's possible.

You need more things on your classpath - you can take a look at the classpath 
within runctakesCVD.bat

Or if you are open to using groovy, take a look at the scripts directory within 
ctakes-core

-Original Message-
From: Abraham Tom [mailto:a...@practicefusion.com]
Sent: Saturday, April 12, 2014 10:55 AM
To: dev@ctakes.apache.org
Subject: Command line invocation

I am not a core java developer, I am a Hadoop data guy We are experimenting 
with using cTakes and we have no Java developers in house

I am trying to invoke ClinicalPipelineWithUmls on a server where I installed 
the developer cTakes.   This invocation is done via the following command line

java -verbose -cp 
"/opt/cTAKES-3.1.1/ctakes-clinical-pipeline/target/classes;/home/mapr/.m2/repository/org/cleartk/cleartk-util/0.9.2/cleartk-util-0.9.2.jar"
 org.apache.ctakes.clinicalpipeline.ClinicalPipelineWithUmls

but I am getting a Class not found error

I would like to invoke via command line so that I can wrap a shell script 
around it and automate the processing of various docs.

This should be possible shouldn't it?


Best regards,

Abraham Tom



suggestion for default pipelines

2014-04-15 Thread Miller, Timothy
The discussion in the other thread with Abraham Tom gave me an idea I
wanted to float to the list. We have been using some UIMAFit pipeline
builders in the temporal project that maybe could be moved into
clinical-pipeline. For example, look to this file:

http://svn.apache.org/viewvc/ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/pipelines/TemporalExtractionPipeline_ImplBase.java?view=markup

with the static methods getPreprocessorAggregateBuilder() and
getLightweightPreprocessorAggregateBuilder()   [no umls].

So my idea would be to create a class in clinical-pipeline
(CTakesPipelines) with static methods for some standard pipelines (to
return AnalysisEngineDescriptions instead of AggregateBuilders?):

getStandardUMLSPipeline()  -- builds pipeline currently in
AggregatePlaintextUMLSProcessor.xml
getFullPipeline() -- same as above but with SRL, constituency parsing,
etc., every component in ctakes

We could then potentially merge our entry points -- I think Abraham's
experience points out that this is currently confusing, as well as
probably not implemented optimally. For example, either
ClinicalPipelineWithUmls or BagOfCUIsGenerator would use that static
method to run a uimafit-style pipeline. Maybe we can slowly deprecate
our xml descriptors too unless people feel strongly about keeping those
around.

Another benefit is that the cTAKES API is then trivial -- if you import
ctakes into your pom file getting a UIMA pipeline is one UimaFit call:

builder.add(CTAKESPipelines.getStandardUMLSPipeline());


I think this would actually be pretty easy to implement, but hoping to
get some feedback on whether this is a good direction.

Tim





RE: suggestion for default pipelines

2014-04-15 Thread Finan, Sean
+1 I think that a factory is a great idea.

I (personally) dislike the descriptor schema, but I think that deprecation is 
the way to go until a replacement comes along.  



-Original Message-
From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] 
Sent: Tuesday, April 15, 2014 9:54 AM
To: dev@ctakes.apache.org
Subject: suggestion for default pipelines

The discussion in the other thread with Abraham Tom gave me an idea I
wanted to float to the list. We have been using some UIMAFit pipeline
builders in the temporal project that maybe could be moved into
clinical-pipeline. For example, look to this file:

http://svn.apache.org/viewvc/ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/pipelines/TemporalExtractionPipeline_ImplBase.java?view=markup

with the static methods getPreprocessorAggregateBuilder() and
getLightweightPreprocessorAggregateBuilder()   [no umls].

So my idea would be to create a class in clinical-pipeline
(CTakesPipelines) with static methods for some standard pipelines (to
return AnalysisEngineDescriptions instead of AggregateBuilders?):

getStandardUMLSPipeline()  -- builds pipeline currently in
AggregatePlaintextUMLSProcessor.xml
getFullPipeline() -- same as above but with SRL, constituency parsing,
etc., every component in ctakes

We could then potentially merge our entry points -- I think Abraham's
experience points out that this is currently confusing, as well as
probably not implemented optimally. For example, either
ClinicalPipelineWithUmls or BagOfCUIsGenerator would use that static
method to run a uimafit-style pipeline. Maybe we can slowly deprecate
our xml descriptors too unless people feel strongly about keeping those
around.

Another benefit is that the cTAKES API is then trivial -- if you import
ctakes into your pom file getting a UIMA pipeline is one UimaFit call:

builder.add(CTAKESPipelines.getStandardUMLSPipeline());


I think this would actually be pretty easy to implement, but hoping to
get some feedback on whether this is a good direction.

Tim





RE: suggestion for default pipelines

2014-04-15 Thread Masanz, James J.
+1

-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] 
Sent: Tuesday, April 15, 2014 9:05 AM
To: dev@ctakes.apache.org
Subject: RE: suggestion for default pipelines

+1 I think that a factory is a great idea.

I (personally) dislike the descriptor schema, but I think that deprecation is 
the way to go until a replacement comes along.  



-Original Message-
From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] 
Sent: Tuesday, April 15, 2014 9:54 AM
To: dev@ctakes.apache.org
Subject: suggestion for default pipelines

The discussion in the other thread with Abraham Tom gave me an idea I
wanted to float to the list. We have been using some UIMAFit pipeline
builders in the temporal project that maybe could be moved into
clinical-pipeline. For example, look to this file:

http://svn.apache.org/viewvc/ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/pipelines/TemporalExtractionPipeline_ImplBase.java?view=markup

with the static methods getPreprocessorAggregateBuilder() and
getLightweightPreprocessorAggregateBuilder()   [no umls].

So my idea would be to create a class in clinical-pipeline
(CTakesPipelines) with static methods for some standard pipelines (to
return AnalysisEngineDescriptions instead of AggregateBuilders?):

getStandardUMLSPipeline()  -- builds pipeline currently in
AggregatePlaintextUMLSProcessor.xml
getFullPipeline() -- same as above but with SRL, constituency parsing,
etc., every component in ctakes

We could then potentially merge our entry points -- I think Abraham's
experience points out that this is currently confusing, as well as
probably not implemented optimally. For example, either
ClinicalPipelineWithUmls or BagOfCUIsGenerator would use that static
method to run a uimafit-style pipeline. Maybe we can slowly deprecate
our xml descriptors too unless people feel strongly about keeping those
around.

Another benefit is that the cTAKES API is then trivial -- if you import
ctakes into your pom file getting a UIMA pipeline is one UimaFit call:

builder.add(CTAKESPipelines.getStandardUMLSPipeline());


I think this would actually be pretty easy to implement, but hoping to
get some feedback on whether this is a good direction.

Tim





RE: suggestion for default pipelines

2014-04-15 Thread Abraham Tom
+1



Best regards,

Abraham Tom

Abraham Tom
Data Warehouse Engineer
415.757.4674 (p) | 415.356.0950 (f)
a...@practicefusion.com
http://www.practicefusion.com
www.facebook.com/practicefusion

The contents of this message, together with any attachments, are intended only 
for the use of the individual or entity to which they are addressed and may 
contain information that is legally privileged, confidential and exempt from 
disclosure. If you are not the intended recipient, you are hereby notified that 
any dissemination, distribution, or copying of this message, or any attachment, 
is strictly prohibited. If you have received this message in error, please 
notify the original sender or contact Practice Fusion at 415.346.7700 ext 4 
immediately by telephone or by return E-mail and delete this message, along 
with any attachments, from your computer. Thank you


-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] 
Sent: Tuesday, April 15, 2014 7:05 AM
To: dev@ctakes.apache.org
Subject: RE: suggestion for default pipelines

+1 I think that a factory is a great idea.

I (personally) dislike the descriptor schema, but I think that deprecation is 
the way to go until a replacement comes along.  



-Original Message-
From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu]
Sent: Tuesday, April 15, 2014 9:54 AM
To: dev@ctakes.apache.org
Subject: suggestion for default pipelines

The discussion in the other thread with Abraham Tom gave me an idea I wanted to 
float to the list. We have been using some UIMAFit pipeline builders in the 
temporal project that maybe could be moved into clinical-pipeline. For example, 
look to this file:

http://svn.apache.org/viewvc/ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/pipelines/TemporalExtractionPipeline_ImplBase.java?view=markup

with the static methods getPreprocessorAggregateBuilder() and
getLightweightPreprocessorAggregateBuilder()   [no umls].

So my idea would be to create a class in clinical-pipeline
(CTakesPipelines) with static methods for some standard pipelines (to return 
AnalysisEngineDescriptions instead of AggregateBuilders?):

getStandardUMLSPipeline()  -- builds pipeline currently in 
AggregatePlaintextUMLSProcessor.xml
getFullPipeline() -- same as above but with SRL, constituency parsing, etc., 
every component in ctakes

We could then potentially merge our entry points -- I think Abraham's 
experience points out that this is currently confusing, as well as probably not 
implemented optimally. For example, either ClinicalPipelineWithUmls or 
BagOfCUIsGenerator would use that static method to run a uimafit-style 
pipeline. Maybe we can slowly deprecate our xml descriptors too unless people 
feel strongly about keeping those around.

Another benefit is that the cTAKES API is then trivial -- if you import ctakes 
into your pom file getting a UIMA pipeline is one UimaFit call:

builder.add(CTAKESPipelines.getStandardUMLSPipeline());


I think this would actually be pretty easy to implement, but hoping to get some 
feedback on whether this is a good direction.

Tim





Re: suggestion for default pipelines

2014-04-15 Thread Steven Bethard
+1. And note that once you have a descriptor, you can generate the
XML, so we should arrange to replace the current XML descriptors with
ones generated automatically from the uimaFIT code. That should reduce
some synchronization problems when the Java code was changed but the
XML descriptor was not.

Steve

On Tue, Apr 15, 2014 at 8:52 AM, Miller, Timothy
 wrote:
> The discussion in the other thread with Abraham Tom gave me an idea I
> wanted to float to the list. We have been using some UIMAFit pipeline
> builders in the temporal project that maybe could be moved into
> clinical-pipeline. For example, look to this file:
>
> http://svn.apache.org/viewvc/ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/pipelines/TemporalExtractionPipeline_ImplBase.java?view=markup
>
> with the static methods getPreprocessorAggregateBuilder() and
> getLightweightPreprocessorAggregateBuilder()   [no umls].
>
> So my idea would be to create a class in clinical-pipeline
> (CTakesPipelines) with static methods for some standard pipelines (to
> return AnalysisEngineDescriptions instead of AggregateBuilders?):
>
> getStandardUMLSPipeline()  -- builds pipeline currently in
> AggregatePlaintextUMLSProcessor.xml
> getFullPipeline() -- same as above but with SRL, constituency parsing,
> etc., every component in ctakes
>
> We could then potentially merge our entry points -- I think Abraham's
> experience points out that this is currently confusing, as well as
> probably not implemented optimally. For example, either
> ClinicalPipelineWithUmls or BagOfCUIsGenerator would use that static
> method to run a uimafit-style pipeline. Maybe we can slowly deprecate
> our xml descriptors too unless people feel strongly about keeping those
> around.
>
> Another benefit is that the cTAKES API is then trivial -- if you import
> ctakes into your pom file getting a UIMA pipeline is one UimaFit call:
>
> builder.add(CTAKESPipelines.getStandardUMLSPipeline());
>
>
> I think this would actually be pretty easy to implement, but hoping to
> get some feedback on whether this is a good direction.
>
> Tim
>
>
>