Hi Thomas,

Short answer:
You can't do that.  The collection of Section definitions is shared through all 
of the pipelines.

Long answer:
I think that there might be another approach.

My guess is that within your two different note types there is some common 
section header expression, but the content and intention and use of the section 
information is different.

If that is the case, I would propose the following:

1.  Use just a single sectionizer.
-- sectionization, as with any regex process, can be "slow".  It is better to 
detect a common word by running just a single regex over text than two 
different regex that look for the same word.
2.  Use one pipeline definition.
-- While using two unlike pipelines simultaneously, if processing n notes of 
type A takes X seconds and processing n' notes of type B takes >>X seconds then 
you are stuck waiting on B process time.
-- It also makes latter description of a single pipeline easier ...  as below 
(hopefully).
3.  Make a simple annotation engine that determines note type and adjusts the 
properties of sections identified with the common section header based upon the 
note type.
-- The complexity of this depends upon the differences in sections with common 
headers.

-- Please Note: I am typing this freehand, so there are probably typos and 
missing items.  There are also probably better ways to do the same thing.  It 
should give you the general idea.  A lot of people in the community don't dream 
in java so I sometimes add this kind of thing to (hopefully) save time.


String noteType = new NoteSpecs( jCas ).getDocumentType();

List<Segment> sections = new ArrayList( JCasUtil.select( jCas, Segment.class ) 
);
Collections.sort( Comparator.comparingInt( Segment::getBegin ) );

if ( sections.size <= 1 ) {
   return;
}

//  Join sections if one is unwanted.
Collection<Segment> unwantedSections = new HashSet<>();
Segment previousSection = sections.get( 0 );
for ( int i=1; i<sections.size; i++ ) {
   Segment section = sections.get( i );
   if ( !isWantedSection( noteType, section.getPreferredText() ) {
      previousSection.setEnd( section.getEnd() );
      unwantedSections.add( section );
      section.removeFromIndices();
      continue;
   }
   previousSection = section;
}
sections.removeAll( unwantedSections );

// Rename Sections
sections.foreach( s -> adjustSectionInfo( noteType, s ) );


//  Something to defined unwanted sections:
Collection<String> BAD_A_SECTIONS = Arrays.asList( "Bilge", "Plumbing" );
Collection<String> BAD_B_SECTIONS = Arrays.asList( "Joint", "Elbow" );
boolean isWantedSection( String noteType, String sectionType ) {
   return ( sectionType.equals("A") && BAD_A_SECTIONS.contains( sectionType ) )
           ||   ( sectionType.equals("B") && BAD_B_SECTIONS.contains( 
sectionType ) )
}

// And something to adjust properties of certain section types:
Map<String,String> X_TO_A_SECTIONS = new HashMap<>()
Map<String,String> X_TO_B_SECTIONS = new HashMap<>()
initRenameMaps() {
   X_TO_A_SECTIONS.put( "Stern", "Sternum" );
   X_TO_B_SECTIONS.put( "Stern", "Tough Guy" );
}
void adjustSectionInfo( String noteType, Segment section ) {
   if ( noteType.equals( "A" ) ) {
       String newName = X_TO_A_SECTIONS.get( segment.getPreferredText() );
       if ( newName != null ) {
         section.setPreferredText( newName );
      }
   } else if ( noteType.equals( "B" ) {
      etc.
   }
}



Sean



________________________________________
From: Thomas W Loehfelm <twloehf...@ucdavis.edu.INVALID>
Sent: Friday, January 29, 2021 7:25 PM
To: dev@ctakes.apache.org
Subject: Re: Passing SectionsBsv to piper containing BsvRegexSectionizer 
[EXTERNAL]

* External Email - Caution *


Sorry for the second email.

The a_engine and b_engine lines contain typos in that they do not specify the 
specific a_ or b_pipeline – I inadvertently introduced this typo just while 
reproducing the generic example into the email – the original code is correct 
so that is not the source of the problem.

And to further clarify, the general concept works – both AE pools are created, 
and both can process text, it is literally just that the SectionsBsv param 
setting persists between the two so that the second pool ends up using the same 
BSV file as the first one.


From: Thomas W Loehfelm <twloehf...@ucdavis.edu.INVALID>
Date: Friday, January 29, 2021 at 4:11 PM
To: dev@ctakes.apache.org <dev@ctakes.apache.org>
Subject: Passing SectionsBsv to piper containing BsvRegexSectionizer
I have a CTakes API endpoint based on the REST API and I am trying to specifiy 
a different BSV file depending on the type of text.

My idea is to instantiate two different analysis engine pools, and direct text 
one or the other depending on which type of report it is. This seems simpler to 
me than spinning up two entirely separate ctakes end points and using one for 
one type and one for the other, though I know that I could accomplish what I am 
looking to do by going that direction. It seems like I am missing something 
basic that is preventing my initial plan from working though.

Let’s say the different AE pools are A and B as below, and say the PIPER_FILEs 
at the paths are the same except they hard code a different Bsv file like so:
A_PIPER_FILE includes: add BsvRegexSectionizer SectionsBsv=resources/a.bsv
B_PIPER_FILE includes: add BsvRegexSectionizer SectionsBsv=resources/b.bsv

final PiperFileReader a_reader = new PiperFileReader(A_PIPER_FILE_PATH);
final PipelineBuilder a_builder = a_reader.getBuilder();
final AnalysisEngineDescription a_pipeline = a_builder.getAnalysisEngineDesc();
_a_engine = UIMAFramework.produceAnalysisEngine(pipeline);
_a_pool = new JCasPool( 2, _a_engine );

final PiperFileReader b_reader = new PiperFileReader(B_PIPER_FILE_PATH);
final PipelineBuilder b_builder = b_reader.getBuilder();
final AnalysisEngineDescription b_pipeline = b_builder.getAnalysisEngineDesc();
_b_engine = UIMAFramework.produceAnalysisEngine(pipeline);
_b_pool = new JCasPool( 2, _b_engine );


The problem I am running in to is that the “B” analysis engine uses the “A” 
SectionsBsv file even though the piper files specify the correct one to use. It 
seems that once SectionsBsv is set once, it is not reset even though a 
subsequent piper file may specify a different resource to use.

Any ideas on what is happening, how I can clear or reset that param, or whether 
there is a different way to accomplish what I am trying to do?

Things I have tried:

  1.  Adding “_b_engine.reconfigure();” between _b_engine and _b_pool lines.
     *   No effect.
  2.  Removing the hard-coded SectionsBsv assignment from the piper file, using 
the SAME piper file for each instance, and passing in SectionsBsv as a param.
     *   I am not sure how to do this using the construction above. I have 
looked in to CliOptionals but do not have a good grasp of them.
     *   I have tried adding “a_builder.set(“SectionsBsv”, “resources/a.bsv”) 
after the a_builder is created but that had no affect either

Thanks in advance for your consideration.

Tom
**CONFIDENTIALITY NOTICE** This e-mail communication and any attachments are 
for the sole use of the intended recipient and may contain information that is 
confidential and privileged under state and federal privacy laws. If you 
received this e-mail in error, be aware that any unauthorized use, disclosure, 
copying, or distribution is strictly prohibited. If you received this e-mail in 
error, please contact the sender immediately and destroy/delete all copies of 
this message.

Reply via email to