Re: Question about the pipeline

2015-02-03 Thread Maite Meseure Hugues
Thanks a lot Sean for your detailed reply. I've also found RunCPE.java that
allows to put the input and outpur directories in arguments in the
environment and do the same job than the CPE-GUI -at least in Eclipse, I
haven't managed to run it via the command line yet.

On Mon, Feb 2, 2015 at 7:12 PM, Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:

> Hi Tol (and Maite),
>
> I'm not entirely certain that I understand the question, but here is an
> attempt to help.  If I'm oversimplifying then I apologize.
>
> I think that ExampleAggregatePipeline is intended to represent a very
> simple single-note pipeline and that custom code could be produced by using
> it as an example.
>
> If you want to process texts in a directory, you can find with a web
> search plenty of ways to list files in a directory and read text from
> files.  org.apache.ctakes.core.cr.FilesInDirectoryCollectionReader might be
> what you used in the CPE, and you can certainly peruse the code and take
> what you need.  Or, if you decide to write a simple diy,  here is one
> possibility:
>
> Static public Collection getFilesInDir( final File directory ) {
>final Collection fileList = new ArrayList<>();
>final File[] fileList = directory.listFiles();
>if ( fileList == null ) {
>   System.err.println( "please check the directory " +
> directory.getAbsolutePath() );
>   System.exit( 1 );
>}
> for ( final File file : directory.listFiles() ) {
> if ( file.canRead() ) {
> fileList.add( file );
> }
> }
> }
>
> Static public String getTextInFile( final File file ) throws IOException
> {   -- or handle ioE herein
>final Path nioPath = file.toPath();
>return new String( Files.readAllBytes( nioPath ) );
> }
>
> Static public void main( String ... args ) {
>If ( args[0].isEmpty() ) {
>   System.out.println( "Enter a directory path" );
>   System.exit( 0 );
>}
>Final Collection files = getFilesInDir( new File( args[0] );
>For ( File file : files ) {
>   Final String note = getTextInFile( file );
>   ---  Insert here code a' la ExampleAggregatePipeline  ---
>   ---  swap out the writer in ExampleAggregatePipeline with CasIOUtil
> method (below)  ---
>}
> }
>
> I must admit that I have never directly used it, but there is an xmi file
> writing method in org.apache.uima.fit.util.CasIOUtil named writeXmi( JCas
> jCas, File file ).  You could give this a try and see if it produces the
> type of output that you want.  The same utility class has a writeXCas(..)
> method.
>
>
> If the above has absolutely nothing to do with your needs then please send
> me a bulleted list of items, example workflow, etc. and I'll see if I can
> be of service.
>
> Oh, and I wrote the above code freehand, so MS Outlook is adding capital
> letters, etc.  If you cut and paste you'll need to change that - plus I
> haven't run/compiled, so there might be a typo or missed exception or
> something.  Or it may not work (in which case I'll throw in a little more
> effort).
>
> Sean
>
>
> -Original Message-
> From: Tol O. [mailto:tol...@gmail.com]
> Sent: Monday, February 02, 2015 6:56 PM
> To: dev@ctakes.apache.org
> Subject: Re: Question about the pipeline
>
> Maite Meseure Hugues  writes:
>
> >
> > Hello all,
> >
> > Thank you for your preceding answers.
> > I have a few questions regarding the pipeline example to run cTakes
> > programmatically.
> > I am running ExampleAggregatePipeline.java with
> > ExampleHelloWorldAnnotator but I would like to know how I can change
> > it to run my data, as the CPE where we can choose the directory of our
> data.
> > My second question is about the xml output generated with the CPE, can
> > I get the same xml output in using the example pipeline? and How?
> > Thanks for your time.
>
>
> I would like to ask the same question. After successfully setting up
> CTAKES following the Developers Guide I would also like to use a modified
> ExampleAggregatePipeline to output a CAS file identical to the output
> obtained by the CPE or the CVD when following the Users Guide.
>
> This would be a great help for developers as a starting class to be able
> to programmatically obtain an annotated file based on a plaintext or XML
> input, same as through the two GUIs.
>
> Right now I am reading through the Component Use Guide to replicate the
> CPE or the CVD tutorial with the test input, but it is a bit overwhelming.
>
> Any pointers or suggestions would be really appreciated.
>
> Tol O.
>
>


-- 
--
 Maïté Meseure Hugues


RE: Question about the pipeline

2015-02-03 Thread Finan, Sean
Hi Maite,

RunCPE is a good find, and if it fits your bil hten you should use it.  But it 
(if you mean the yTex class) doesn't take input and output directories from the 
command line.  It does take the path to a CPE.xml file.  There is a cTakes 
(non-yTex) equivalent named CmdLineCpeRunner.  Either one of them should print 
a usage if you run it without arguments.  As the CmdLineCpeRunner indicates, 
you can create a cpe .xml file with the cpe gui.  Basically, start the cpe gui, 
select your input (reader), output (writer) and pipeline (ae) in the gui and 
then save the cpe descriptor (via the menubar).  You can exit the gui and run 
either one of the cmd line utilities with the path to that cpe .xml descriptor 
as the argument.  Please note: sometimes you have to explicitly type ".xml" in 
the filename when saving with the cpe gui.  If you run with the cpe gui and 
then exit it should automatically ask you if you want to save the cpe .xml 
descriptor.  Anyway, once you have the .xml file you can always edit the input 
and output paths in that file to change your run parameters.  

Sean

-Original Message-
From: Maite Meseure Hugues [mailto:meseure.ma...@gmail.com] 
Sent: Tuesday, February 03, 2015 9:01 AM
To: dev@ctakes.apache.org
Subject: Re: Question about the pipeline

Thanks a lot Sean for your detailed reply. I've also found RunCPE.java that 
allows to put the input and outpur directories in arguments in the environment 
and do the same job than the CPE-GUI -at least in Eclipse, I haven't managed to 
run it via the command line yet.

On Mon, Feb 2, 2015 at 7:12 PM, Finan, Sean < sean.fi...@childrens.harvard.edu> 
wrote:

> Hi Tol (and Maite),
>
> I'm not entirely certain that I understand the question, but here is 
> an attempt to help.  If I'm oversimplifying then I apologize.
>
> I think that ExampleAggregatePipeline is intended to represent a very 
> simple single-note pipeline and that custom code could be produced by 
> using it as an example.
>
> If you want to process texts in a directory, you can find with a web 
> search plenty of ways to list files in a directory and read text from 
> files.  org.apache.ctakes.core.cr.FilesInDirectoryCollectionReader 
> might be what you used in the CPE, and you can certainly peruse the 
> code and take what you need.  Or, if you decide to write a simple diy,  
> here is one
> possibility:
>
> Static public Collection getFilesInDir( final File directory ) {
>final Collection fileList = new ArrayList<>();
>final File[] fileList = directory.listFiles();
>if ( fileList == null ) {
>   System.err.println( "please check the directory " +
> directory.getAbsolutePath() );
>   System.exit( 1 );
>}
> for ( final File file : directory.listFiles() ) {
> if ( file.canRead() ) {
> fileList.add( file );
> }
> }
> }
>
> Static public String getTextInFile( final File file ) throws IOException
> {   -- or handle ioE herein
>final Path nioPath = file.toPath();
>return new String( Files.readAllBytes( nioPath ) ); }
>
> Static public void main( String ... args ) {
>If ( args[0].isEmpty() ) {
>   System.out.println( "Enter a directory path" );
>   System.exit( 0 );
>}
>Final Collection files = getFilesInDir( new File( args[0] );
>For ( File file : files ) {
>   Final String note = getTextInFile( file );
>   ---  Insert here code a' la ExampleAggregatePipeline  ---
>   ---  swap out the writer in ExampleAggregatePipeline with 
> CasIOUtil method (below)  ---
>}
> }
>
> I must admit that I have never directly used it, but there is an xmi 
> file writing method in org.apache.uima.fit.util.CasIOUtil named 
> writeXmi( JCas jCas, File file ).  You could give this a try and see 
> if it produces the type of output that you want.  The same utility 
> class has a writeXCas(..) method.
>
>
> If the above has absolutely nothing to do with your needs then please 
> send me a bulleted list of items, example workflow, etc. and I'll see 
> if I can be of service.
>
> Oh, and I wrote the above code freehand, so MS Outlook is adding 
> capital letters, etc.  If you cut and paste you'll need to change that 
> - plus I haven't run/compiled, so there might be a typo or missed 
> exception or something.  Or it may not work (in which case I'll throw 
> in a little more effort).
>
> Sean
>
>
> -Original Message-
> From: Tol O. [mailto:tol...@gmail.com]
> Sent: Monday, February 02, 2015 6:56 PM
> To: dev@ctakes.apache.org
> Subject: Re: Question about the pipeline
>
> Maite Meseure Hugues  writes:
>
> >
> > Hello all,
> >
> > Thank you for your preceding answers.
> > I have a few questions regarding the pipeline example to run cTakes 
> > programmatically.
> > I am running ExampleAggregatePipeline.java with 
> > ExampleHelloWorldAnnotator but I would like to know how I can change 
> > it to run my data, as the CPE where we can choose the directory of 
> > our
> data.
> > My se

Re: Question about the pipeline

2015-02-03 Thread Maite Meseure Hugues
Oh yes my apologies, I mixed RunCPE that takes the cpe.xml and
BagofCuisGenerator that takes input and output directories in arguments.
Thanks for the pointer on CmlLineCpeRunner, I hadn't seen that.

On Tue, Feb 3, 2015 at 1:47 PM, Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:

> Hi Maite,
>
> RunCPE is a good find, and if it fits your bil hten you should use it.
> But it (if you mean the yTex class) doesn't take input and output
> directories from the command line.  It does take the path to a CPE.xml
> file.  There is a cTakes (non-yTex) equivalent named CmdLineCpeRunner.
> Either one of them should print a usage if you run it without arguments.
> As the CmdLineCpeRunner indicates, you can create a cpe .xml file with the
> cpe gui.  Basically, start the cpe gui, select your input (reader), output
> (writer) and pipeline (ae) in the gui and then save the cpe descriptor (via
> the menubar).  You can exit the gui and run either one of the cmd line
> utilities with the path to that cpe .xml descriptor as the argument.
> Please note: sometimes you have to explicitly type ".xml" in the filename
> when saving with the cpe gui.  If you run with the cpe gui and then exit it
> should automatically ask you if you want to save the cpe .xml descriptor.
> Anyway, once you have the .xml file you can always edit the input and
> output paths in that file to change your run parameters.
>
> Sean
>
> -Original Message-
> From: Maite Meseure Hugues [mailto:meseure.ma...@gmail.com]
> Sent: Tuesday, February 03, 2015 9:01 AM
> To: dev@ctakes.apache.org
> Subject: Re: Question about the pipeline
>
> Thanks a lot Sean for your detailed reply. I've also found RunCPE.java
> that allows to put the input and outpur directories in arguments in the
> environment and do the same job than the CPE-GUI -at least in Eclipse, I
> haven't managed to run it via the command line yet.
>
> On Mon, Feb 2, 2015 at 7:12 PM, Finan, Sean <
> sean.fi...@childrens.harvard.edu> wrote:
>
> > Hi Tol (and Maite),
> >
> > I'm not entirely certain that I understand the question, but here is
> > an attempt to help.  If I'm oversimplifying then I apologize.
> >
> > I think that ExampleAggregatePipeline is intended to represent a very
> > simple single-note pipeline and that custom code could be produced by
> > using it as an example.
> >
> > If you want to process texts in a directory, you can find with a web
> > search plenty of ways to list files in a directory and read text from
> > files.  org.apache.ctakes.core.cr.FilesInDirectoryCollectionReader
> > might be what you used in the CPE, and you can certainly peruse the
> > code and take what you need.  Or, if you decide to write a simple diy,
> > here is one
> > possibility:
> >
> > Static public Collection getFilesInDir( final File directory ) {
> >final Collection fileList = new ArrayList<>();
> >final File[] fileList = directory.listFiles();
> >if ( fileList == null ) {
> >   System.err.println( "please check the directory " +
> > directory.getAbsolutePath() );
> >   System.exit( 1 );
> >}
> > for ( final File file : directory.listFiles() ) {
> > if ( file.canRead() ) {
> > fileList.add( file );
> > }
> > }
> > }
> >
> > Static public String getTextInFile( final File file ) throws IOException
> > {   -- or handle ioE herein
> >final Path nioPath = file.toPath();
> >return new String( Files.readAllBytes( nioPath ) ); }
> >
> > Static public void main( String ... args ) {
> >If ( args[0].isEmpty() ) {
> >   System.out.println( "Enter a directory path" );
> >   System.exit( 0 );
> >}
> >Final Collection files = getFilesInDir( new File( args[0] );
> >For ( File file : files ) {
> >   Final String note = getTextInFile( file );
> >   ---  Insert here code a' la ExampleAggregatePipeline  ---
> >   ---  swap out the writer in ExampleAggregatePipeline with
> > CasIOUtil method (below)  ---
> >}
> > }
> >
> > I must admit that I have never directly used it, but there is an xmi
> > file writing method in org.apache.uima.fit.util.CasIOUtil named
> > writeXmi( JCas jCas, File file ).  You could give this a try and see
> > if it produces the type of output that you want.  The same utility
> > class has a writeXCas(..) method.
> >
> >
> > If the above has absolutely nothing to do with your needs then please
> > send me a bulleted list of items, example workflow, etc. and I'll see
> > if I can be of service.
> >
> > Oh, and I wrote the above code freehand, so MS Outlook is adding
> > capital letters, etc.  If you cut and paste you'll need to change that
> > - plus I haven't run/compiled, so there might be a typo or missed
> > exception or something.  Or it may not work (in which case I'll throw
> > in a little more effort).
> >
> > Sean
> >
> >
> > -Original Message-
> > From: Tol O. [mailto:tol...@gmail.com]
> > Sent: Monday, February 02, 2015 6:56 PM
> > To: dev@ctakes.apache.org
> > Sub

git mirrors out of sync?

2015-02-03 Thread Steven Bethard
The git mirrors for cTAKES seem to be either broken
(http://git.apache.org/ctakes.git) or embarrassingly out of sync
(https://github.com/apache/ctakes). Is this a known issue? I looked at
the INFRA ticket [1], but didn't see anything that suggested that
there should be a problem.

Steve

[1] https://issues.apache.org/jira/browse/INFRA-8553


RE: git mirrors out of sync?

2015-02-03 Thread Finan, Sean
Hi Steve,

You are right (confirming your finding) - it looks like the first is a no-show 
and the second is somebody's personal upload to github (not git.apache.org) 
from 3 years ago.  The jira claims that the item was closed (fixed), but if you 
go to 
https://urldefense.proofpoint.com/v2/url?u=http-3A__git.apache.org_&d=BQIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=6K2jncop0hjH-CSVJRe1t5Ehv0V75znADU0wtfGz_1w&m=NERTSV05Tazy9bLFr0JnQeCe6FcppzevqkKgecLBfhA&s=hg28ET1-cmNSr9e9uZcva97I5GEgyQGtYqBF1BKSQxU&e=
  cTakes is not listed.  Was it there previous to 6 days ago but removed? 

If nobody responds with a "here's yer problem" by end of week then I ( or you, 
if you like) will ping infra.  I know that at least one contributor (not me) 
prefers to use git.

Sean

-Original Message-
From: Steven Bethard [mailto:steven.beth...@gmail.com] 
Sent: Tuesday, February 03, 2015 3:38 PM
To: dev@ctakes.apache.org
Subject: git mirrors out of sync?

The git mirrors for cTAKES seem to be either broken 
(https://urldefense.proofpoint.com/v2/url?u=http-3A__git.apache.org_ctakes.git&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=2TD3UZU0K4cU6Xehm7SjkXAnlWgKfoCoEDC8XWIU5fs&s=YbXZ5LN-Z295poj6jlkGInSjv6t78b2X0QgO8hI0vwk&e=
 ) or embarrassingly out of sync 
(https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_ctakes&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=2TD3UZU0K4cU6Xehm7SjkXAnlWgKfoCoEDC8XWIU5fs&s=YW6_xp81csYAksST2pDnIUjQEEI7rmK60iN9NDYO3cg&e=
 ). Is this a known issue? I looked at the INFRA ticket [1], but didn't see 
anything that suggested that there should be a problem.

Steve

[1] 
https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_INFRA-2D8553&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=2TD3UZU0K4cU6Xehm7SjkXAnlWgKfoCoEDC8XWIU5fs&s=-ZNPLIX5GcrgNmQwjs8qmXU8rG_D8de7ymM9_y3gPPM&e=
 


Re: Question about the pipeline

2015-02-03 Thread Tol O .
>

Sean,

Thank you for the detailed reply.

As you mentioned, I had to revert the capital letters from your Outlook, and
also, if somebody else wants to use the code and cannot get it to run: the
getFilesInDir method needs to return the populated Collection
fileList, the variable final File[] fileList and its usage should be renamed
to something else (as the variable name already exists) and the main method
needs to throw an IOException.

I think these were all the changes I made so that the txt files from a
folder are added to the collection, many thanks again.

What I am looking to do is also what the description in
"ExampleAggregatePipeline" says, "running a pipeline programatically w/o
uima xml descriptor xml files". This is accomplished by what I understand
the uimaFIT classes, so that AEs can be defined in Java, added to a Pipeline
and directly run.

The uimaFIT page gives a nice Java snippet that uses uimaFIT in a similar
way as the cTAKES example, I pasted the few Java lines below at [1]. 
http://uima.apache.org/d/uimafit-current/tools.uimafit.book.html#ugr.tools.uimafit.introduction

I would like to use cTAKES in my own Java programs such that, just like the
ExampleAggregatePipeline, uimaFIT can be used create and run a cTAKES
pipeline to annotate medical texts. Then, I could also output the result in
CAS files, just like the CVD GUI is doing. This would allow to directly be
able to add or modify my own AnalysisEngines.

Essentially, I want to know how to set up the cTAKES objects correctly into
a pipeline in a Java programs, so that medical texts are annotated, like the
GUI is doing. I would really appreciate any hints or how to accomplish this. 

Following your code example to read the files the outlined idea is:

for ( File file : files ) {
  Final String note = getTextInFile( file );
  JCas jCas = JCasFactory.createJCas();
  jCas.setDocumentText(note);

  // 1. create the AnalysisEngines for tokenizer, tagger and other
cTAKES components etc. to annotate medical texts
  // 2. runPipeline(jCas, ...);
}

[1]
The code snippet from uimaFIT:

JCas jCas = JCasFactory.createJCas();

jCas.setDocumentText("some text");

AnalysisEngine tokenizer = createEngine(MyTokenizer.class);

AnalysisEngine tagger = createEngine(MyTagger.class);

runPipeline(jCas, tokenizer, tagger);

for(Token token : iterate(jCas, Token.class)){
System.out.println(token.getTag());
}

Tol O.


Finan, Sean  writes:

> 
> Hi Tol (and Maite),
> 
> I'm not entirely certain that I understand the question, but here is an
attempt to help.  If I'm
> oversimplifying then I apologize.
> 
> I think that ExampleAggregatePipeline is intended to represent a very
simple single-note pipeline and
> that custom code could be produced by using it as an example.
> 
> If you want to process texts in a directory, you can find with a web
search plenty of ways to list files in a
> directory and read text from files. 
org.apache.ctakes.core.cr.FilesInDirectoryCollectionReader
> might be what you used in the CPE, and you can certainly peruse the code
and take what you need.  Or, if you
> decide to write a simple diy,  here is one possibility:
> 
> Static public Collection getFilesInDir( final File directory ) {
>final Collection fileList = new ArrayList<>();
>final File[] fileList = directory.listFiles();
>if ( fileList == null ) {
>   System.err.println( "please check the directory " +
directory.getAbsolutePath() );
>   System.exit( 1 );
>}
> for ( final File file : directory.listFiles() ) {
> if ( file.canRead() ) {
> fileList.add( file );
> }
> }
> } 
> 
> Static public String getTextInFile( final File file ) throws IOException {
  -- or handle ioE herein
>final Path nioPath = file.toPath();
>return new String( Files.readAllBytes( nioPath ) );
> }
> 
> Static public void main( String ... args ) {
>If ( args[0].isEmpty() ) {
>   System.out.println( "Enter a directory path" );
>   System.exit( 0 );
>}
>Final Collection files = getFilesInDir( new File( args[0] );
>For ( File file : files ) {
>   Final String note = getTextInFile( file );
>   ---  Insert here code a' la ExampleAggregatePipeline  ---
>   ---  swap out the writer in ExampleAggregatePipeline with CasIOUtil
method (below)  ---
>}
> }
> 
> I must admit that I have never directly used it, but there is an xmi file
writing method in
> org.apache.uima.fit.util.CasIOUtil named writeXmi( JCas jCas, File file ).
 You could give this a try
> and see if it produces the type of output that you want.  The same utility
class has a writeXCas(..) method.
> 
> If the above has absolutely nothing to do with your needs then please send
me a bulleted list of items,
> example workflow, etc. and I'll see if I can be of service.
> 
> Oh, and I wrote the above code freehand, so MS Outlook is adding capital
letters, etc.  If you cut and paste
> you'll need to change that - plus I haven't 

RE: Question about the pipeline

2015-02-03 Thread Finan, Sean
Hi Tol,

> Essentially, I want to know how to set up the cTAKES objects correctly into a 
> pipeline in a Java programs, so that medical texts are annotated, like the 
> GUI is doing. I would really appreciate any hints or how to accomplish this.

Looking at your embedded code I think that you've got the general idea of how 
to do everything.  Perhaps you are wondering how to create custom pipelines by 
programmatically adding chosen processors?

Tim Miller made a great addition (imo) to the cTakes code with the 
org.apache.ctakes.clinicalpipeline. ClinicalPipelineFactory class.  Perhaps you 
can take a look at that and see if it helps?

Sean

-Original Message-
From: Tol O. [mailto:tol...@gmail.com] 
Sent: Tuesday, February 03, 2015 7:35 PM
To: dev@ctakes.apache.org
Subject: Re: Question about the pipeline

>

Sean,

Thank you for the detailed reply.

As you mentioned, I had to revert the capital letters from your Outlook, and 
also, if somebody else wants to use the code and cannot get it to run: the 
getFilesInDir method needs to return the populated Collection fileList, 
the variable final File[] fileList and its usage should be renamed to something 
else (as the variable name already exists) and the main method needs to throw 
an IOException.

I think these were all the changes I made so that the txt files from a folder 
are added to the collection, many thanks again.

What I am looking to do is also what the description in 
"ExampleAggregatePipeline" says, "running a pipeline programatically w/o uima 
xml descriptor xml files". This is accomplished by what I understand the 
uimaFIT classes, so that AEs can be defined in Java, added to a Pipeline and 
directly run.

The uimaFIT page gives a nice Java snippet that uses uimaFIT in a similar way 
as the cTAKES example, I pasted the few Java lines below at [1]. 
https://urldefense.proofpoint.com/v2/url?u=http-3A__uima.apache.org_d_uimafit-2Dcurrent_tools.uimafit.book.html-23ugr.tools.uimafit.introduction&d=BQICAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=uhPMXYD_U8cpnenfJCFigx00DCavTuwRGY-irX80FfU&s=4s5P35eByjHcLHM6WEp5jmjquPc-wynEgjBWnY6I6Pg&e=
 

I would like to use cTAKES in my own Java programs such that, just like the 
ExampleAggregatePipeline, uimaFIT can be used create and run a cTAKES pipeline 
to annotate medical texts. Then, I could also output the result in CAS files, 
just like the CVD GUI is doing. This would allow to directly be able to add or 
modify my own AnalysisEngines.

Essentially, I want to know how to set up the cTAKES objects correctly into a 
pipeline in a Java programs, so that medical texts are annotated, like the GUI 
is doing. I would really appreciate any hints or how to accomplish this. 

Following your code example to read the files the outlined idea is:

for ( File file : files ) {
  Final String note = getTextInFile( file );
  JCas jCas = JCasFactory.createJCas();
  jCas.setDocumentText(note);

  // 1. create the AnalysisEngines for tokenizer, tagger and other cTAKES 
components etc. to annotate medical texts
  // 2. runPipeline(jCas, ...);
}

[1]
The code snippet from uimaFIT:

JCas jCas = JCasFactory.createJCas();

jCas.setDocumentText("some text");

AnalysisEngine tokenizer = createEngine(MyTokenizer.class);

AnalysisEngine tagger = createEngine(MyTagger.class);

runPipeline(jCas, tokenizer, tagger);

for(Token token : iterate(jCas, Token.class)){
System.out.println(token.getTag());
}

Tol O.


Finan, Sean  writes:

> 
> Hi Tol (and Maite),
> 
> I'm not entirely certain that I understand the question, but here is 
> an
attempt to help.  If I'm
> oversimplifying then I apologize.
> 
> I think that ExampleAggregatePipeline is intended to represent a very
simple single-note pipeline and
> that custom code could be produced by using it as an example.
> 
> If you want to process texts in a directory, you can find with a web
search plenty of ways to list files in a
> directory and read text from files. 
org.apache.ctakes.core.cr.FilesInDirectoryCollectionReader
> might be what you used in the CPE, and you can certainly peruse the 
> code
and take what you need.  Or, if you
> decide to write a simple diy,  here is one possibility:
> 
> Static public Collection getFilesInDir( final File directory ) {
>final Collection fileList = new ArrayList<>();
>final File[] fileList = directory.listFiles();
>if ( fileList == null ) {
>   System.err.println( "please check the directory " +
directory.getAbsolutePath() );
>   System.exit( 1 );
>}
> for ( final File file : directory.listFiles() ) {
> if ( file.canRead() ) {
> fileList.add( file );
> }
> }
> }
> 
> Static public String getTextInFile( final File file ) throws 
> IOException {
  -- or handle ioE herein
>final Path nioPath = file.toPath();
>return new String( Files.readAllBytes( nioPath ) ); }
> 
> Static public void main( String .

Re: git mirrors out of sync?

2015-02-03 Thread Steven Bethard
I added a comment to that infra ticket before I posted here, but no
response so far.

On Tue, Feb 3, 2015 at 2:51 PM, Finan, Sean
 wrote:
> Hi Steve,
>
>
>
> You are right (confirming your finding) - it looks like the first is a 
> no-show and the second is somebody's personal upload to github (not 
> git.apache.org) from 3 years ago.  The jira claims that the item was closed 
> (fixed), but if you go to 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__git.apache.org_&d=BQIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=6K2jncop0hjH-CSVJRe1t5Ehv0V75znADU0wtfGz_1w&m=NERTSV05Tazy9bLFr0JnQeCe6FcppzevqkKgecLBfhA&s=hg28ET1-cmNSr9e9uZcva97I5GEgyQGtYqBF1BKSQxU&e=
>   cTakes is not listed.  Was it there previous to 6 days ago but removed?
>
>
>
> If nobody responds with a "here's yer problem" by end of week then I ( or 
> you, if you like) will ping infra.  I know that at least one contributor (not 
> me) prefers to use git.
>
>
>
> Sean
>
>
>
> -Original Message-
>
> From: Steven Bethard [mailto:steven.beth...@gmail.com]
>
> Sent: Tuesday, February 03, 2015 3:38 PM
>
> To: dev@ctakes.apache.org
>
> Subject: git mirrors out of sync?
>
>
>
> The git mirrors for cTAKES seem to be either broken 
> (https://urldefense.proofpoint.com/v2/url?u=http-3A__git.apache.org_ctakes.git&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=2TD3UZU0K4cU6Xehm7SjkXAnlWgKfoCoEDC8XWIU5fs&s=YbXZ5LN-Z295poj6jlkGInSjv6t78b2X0QgO8hI0vwk&e=
>  ) or embarrassingly out of sync 
> (https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_ctakes&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=2TD3UZU0K4cU6Xehm7SjkXAnlWgKfoCoEDC8XWIU5fs&s=YW6_xp81csYAksST2pDnIUjQEEI7rmK60iN9NDYO3cg&e=
>  ). Is this a known issue? I looked at the INFRA ticket [1], but didn't see 
> anything that suggested that there should be a problem.
>
>
>
> Steve
>
>
>
> [1] 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_INFRA-2D8553&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=2TD3UZU0K4cU6Xehm7SjkXAnlWgKfoCoEDC8XWIU5fs&s=-ZNPLIX5GcrgNmQwjs8qmXU8rG_D8de7ymM9_y3gPPM&e=
>