Re: cTakes Annotation Comparison

Bruce Tietjen Fri, 19 Dec 2014 12:37:46 -0800

My original results were using a newly downloaded cTakes 3.2.1 with the
separately downloaded resources copied in. There were no changes to any of
the configuration files.


As far as this last run, I modified the UMLSLookupAnnotator.xml and
AggregatePlaintextFastUMLSProcessor.xml.  I've attached the modified ones I
used (but they may not get through the mailing list).



 [image: IMAT Solutions] <http://imatsolutions.com>
 Bruce Tietjen
Senior Software Engineer
[image: Mobile:] 801.634.1547
[email protected]

On Fri, Dec 19, 2014 at 1:27 PM, Finan, Sean <
[email protected]> wrote:
>
> Hi Bruce,
>
> I'm not sure how there would be fewer matches with the overlap processor.
> There should be all of the matches from the non-overlap processor plus
> those from the overlap.  Decreasing from 215 to 211 is strange.  Have you
> done any manual spot checks on this?  It is really bizarre that you'd only
> have two matches per document (100 docs?).
>
> Thanks,
> Sean
>
> -----Original Message-----
> From: Bruce Tietjen [mailto:[email protected]]
> Sent: Friday, December 19, 2014 3:23 PM
> To: [email protected]
> Subject: Re: cTakes Annotation Comparison
>
> Sean,
>
> I tried the configuration changes you mentioned in your earlier email.
>
> The results are as follows:
>
> Total Annotations found: 12,161 (default configuration found 8,284)
>
> If counting exact span matches, this run only matched 211 (default
> configuration matched 215).
>
> If counting overlapping spans, this run only matched 220 (default
> configuration matched 224)
>
> Bruce
>
>
>
>  [image: IMAT Solutions] <http://imatsolutions.com>  Bruce Tietjen Senior
> Software Engineer
> [image: Mobile:] 801.634.1547
> [email protected]
>
> On Fri, Dec 19, 2014 at 12:16 PM, Chen, Pei <
> [email protected]>
> wrote:
> >
> >  Kim,
> >
> > Maintenance is the factor not bugs/issue to forge ahead.
> >
> > They are 2 components that do the same thing with the same goal (As
> > Sean mentioned, one should be able configure the new code base to
> > replicate the old algorithm if required- it’s just a simpler and
> > cleaner code base.  If this is not the case or if there are issues, we
> > should fix it and move forward.).
> >
> > We can keep the old component around for as long as needed, but it’s
> > likely going to have limited support…
> >
> > --Pei
> >
> >
> >
> > *From:* Kim Ebert [mailto:[email protected]]
> > *Sent:* Friday, December 19, 2014 1:47 PM
> > *To:* Chen, Pei; [email protected]
> >
> > *Subject:* Re: cTakes Annotation Comparison
> >
> >
> >
> > Pei,
> >
> > I don't think bugs/issues should be part of determining if one
> > algorithm vs the other is superior. Obviously, it is worth mentioning
> > the bugs, but if the fast lookup method has worse precision and recall
> > but better performance, vs the slower but more accurate first word
> > lookup algorithm, then time should be invested in fixing those bugs
> > and resolving those weird issues.
> >
> > Now I'm not saying which one is superior in this case, as the data
> > will end up speaking for itself one way or the other; bus as of right
> > now, I'm not convinced yet that the old dictionary lookup is obsolete
> > yet, and I'm not sure the community is convinced yet either.
> >
> >
> >
> > [image: IMAT Solutions] <http://imatsolutions.com>
> >
> > *Kim Ebert*
> > Software Engineer
> > [image: Office:]801.669.7342
> > [email protected] <[email protected]>
> >
> > On 12/19/2014 08:39 AM, Chen, Pei wrote:
> >
> > Also check out stats that Sean ran before releasing the new component on:
> >
> >
> > http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-dictionary-lookup-
> > fast/doc/DictionaryLookupStats.docx
> >
> > From the evaluation and experience, the new lookup algorithm should be
> > a huge improvement in terms of both speed and accuracy.
> >
> > This is very different than what Bruce mentioned…  I’m sure Sean will
> > chime here.
> >
> > (The old dictionary lookup is essentially obsolete now- plagued with
> > bugs/issues as you mentioned.)
> >
> > --Pei
> >
> >
> >
> > *From:* Kim Ebert [mailto:[email protected]
> > <[email protected]>]
> > *Sent:* Friday, December 19, 2014 10:25 AM
> > *To:* [email protected]
> > *Subject:* Re: cTakes Annotation Comparison
> >
> >
> >
> > Guergana,
> >
> > I'm curious to the number of records that are in your gold standard
> > sets, or if your gold standard set was run through a long running cTAKES
> process.
> > I know at some point we fixed a bug in the old dictionary lookup that
> > caused the permutations to become corrupted over time. Typically this
> > isn't seen in the first few records, but over time as patterns are
> > used the permutations would become corrupted. This caused documents
> > that were fed through cTAKES more than once to have less codes
> > returned than the first time.
> >
> > For example, if a permutation of 4,2,3,1 was found, the permutation
> > would be corrupted to be 1,2,3,4. It would no longer be possible to
> > detect permutations of 4,2,3,1 until cTAKES was restarted. We got the
> > fix in after the cTAKES 3.2.0 release.
> > https://issues.apache.org/jira/browse/CTAKES-310
> > Depending upon the corpus size, I could see the permutation engine
> > eventually only have a single permutation of 1,2,3,4.
> >
> > Typically though, this isn't very easily detected in the first 100 or
> > so documents.
> >
> > We discovered this issue when we made cTAKES have consistent output of
> > codes in our system.
> >
> >
> >
> > [image: IMAT Solutions] <http://imatsolutions.com>
> >
> > *Kim Ebert*
> > Software Engineer
> > [image: Office:]801.669.7342
> > [email protected] <[email protected]>
> >
> > On 12/19/2014 07:05 AM, Savova, Guergana wrote:
> >
> > We are doing a similar kind of evaluation and will report the results.
> >
> >
> >
> > Before we released the Fast lookup, we did a systematic evaluation
> across three gold standard sets. We did not see the trend that Bruce
> reported below. The P, R and F1 results from the old dictionary look up and
> the fast one were similar.
> >
> >
> >
> > Thank you everyone!
> >
> > --Guergana
> >
> >
> >
> > -----Original Message-----
> >
> > From: David Kincaid [mailto:[email protected]
> > <[email protected]>]
> >
> > Sent: Friday, December 19, 2014 9:02 AM
> >
> > To: [email protected]
> >
> > Subject: Re: cTakes Annotation Comparison
> >
> >
> >
> > Thanks for this, Bruce! Very interesting work. It confirms what I've
> seen in my small tests that I've done in a non-systematic way. Did you
> happen to capture the number of false positives yet (annotations made by
> cTAKES that are not in the human adjudicated standard)? I've seen a lot of
> dictionary hits that are not actually entity mentions, but I haven't had a
> chance to do a systematic analysis (we're working on our annotated gold
> standard now). One great example is the antibiotic "Today". Every time the
> word today appears in any text it is annotated as a medication mention when
> it almost never is being used in that sense.
> >
> >
> >
> > These results by themselves are quite disappointing to me. Both the
> UMLSProcessor and especially the FastUMLSProcessor seem to have pretty poor
> recall. It seems like the trade off for more speed is a ten-fold (or more)
> decrease in entity recognition.
> >
> >
> >
> > Thanks again for sharing your results with us. I think they are very
> useful to the project.
> >
> >
> >
> > - Dave
> >
> >
> >
> > On Thu, Dec 18, 2014 at 5:06 PM, Bruce Tietjen <
> [email protected]> wrote:
> >
> >
> >
> > Actually, we are working on a similar tool to compare it to the human
> >
> > adjudicated standard for the set we tested against.  I didn't mention
> >
> > it before because the tool isn't complete yet, but initial results for
> >
> > the set (excluding those marked as "CUI-less") was as follows:
> >
> >
> >
> > Human adjudicated annotations: 4591 (excluding CUI-less)
> >
> >
> >
> > Annotations found matching the human adjudicated standard
> >
> > UMLSProcessor                  2245
> >
> > FastUMLSProcessor           215
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >  [image: IMAT Solutions] <http://imatsolutions.com>
> > <http://imatsolutions.com>  Bruce Tietjen
> >
> > Senior Software Engineer
> >
> > [image: Mobile:] 801.634.1547
> >
> > [email protected]
> >
> >
> >
> > On Thu, Dec 18, 2014 at 3:37 PM, Chen, Pei
> >
> > <[email protected]
> >
> >
> >
> >  wrote:
> >
> >
> >
> > Bruce,
> >
> > Thanks for this-- very useful.
> >
> > Perhaps Sean Finan comment more-
> >
> > but it's also probably worth it to compare to an adjudicated human
> >
> > annotated gold standard.
> >
> >
> >
> > --Pei
> >
> >
> >
> > -----Original Message-----
> >
> > From: Bruce Tietjen [mailto:[email protected]
> > <[email protected]>]
> >
> > Sent: Thursday, December 18, 2014 1:45 PM
> >
> > To: [email protected]
> >
> > Subject: cTakes Annotation Comparison
> >
> >
> >
> > With the recent release of cTakes 3.2.1, we were very interested in
> >
> > checking for any differences in annotations between using the
> >
> > AggregatePlaintextUMLSProcessor pipeline and the
> >
> > AggregatePlanetextFastUMLSProcessor pipeline within this release of
> >
> >  cTakes
> >
> >  with its associated set of UMLS resources.
> >
> >
> >
> > We chose to use the SHARE 14-a-b Training data that consists of 199
> >
> > documents (Discharge  61, ECG 54, Echo 42 and Radiology 42) as the
> >
> > basis for the comparison.
> >
> >
> >
> > We decided to share a summary of the results with the development
> >
> > community.
> >
> >
> >
> > Documents Processed: 199
> >
> >
> >
> > Processing Time:
> >
> > UMLSProcessor           2,439 seconds
> >
> > FastUMLSProcessor    1,837 seconds
> >
> >
> >
> > Total Annotations Reported:
> >
> > UMLSProcessor                  20,365 annotations
> >
> > FastUMLSProcessor             8,284 annotations
> >
> >
> >
> >
> >
> > Annotation Comparisons:
> >
> > Annotations common to both sets:                                  3,940
> >
> > Annotations reported only by the UMLSProcessor:         16,425
> >
> > Annotations reported only by the FastUMLSProcessor:    4,344
> >
> >
> >
> >
> >
> > If anyone is interested, following was our test procedure:
> >
> >
> >
> > We used the UIMA CPE to process the document set twice, once using
> >
> > the AggregatePlaintextUMLSProcessor pipeline and once using the
> >
> > AggregatePlaintextFastUMLSProcessor pipeline. We used the
> >
> > WriteCAStoFile CAS consumer to write the results to output files.
> >
> >
> >
> > We used a tool we recently developed to analyze and compare the
> >
> > annotations generated by the two pipelines. The tool compares the
> >
> > two outputs for each file and reports any differences in the
> >
> > annotations (MedicationMention, SignSymptomMention,
> >
> > ProcedureMention, AnatomicalSiteMention, and
> >
> > DiseaseDisorderMention) between the two output sets. The tool
> >
> > reports the number of 'matches' and 'misses' between each annotation
> set. A 'match'
> >
> >  is
> >
> >  defined as the presence of an identified source text interval with
> >
> > its associated CUI appearing in both annotation sets. A 'miss' is
> >
> > defined as the presence of an identified source text interval and
> >
> > its associated CUI in one annotation set, but no matching identified
> >
> > source text interval
> >
> >  and
> >
> >  CUI in the other. The tool also reports the total number of
> >
> > annotations (source text intervals with associated CUIs) reported in
> >
> > each annotation set. The compare tool is in our GitHub repository at
> >
> > https://github.com/perfectsearch/cTAKES-compare
> >
> >
> >
> >
> >
> >
> >
> >
> >
>

<?xml version="1.0" encoding="UTF-8"?>
<!--

    Licensed to the Apache Software Foundation (ASF) under one
    or more contributor license agreements.  See the NOTICE file
    distributed with this work for additional information
    regarding copyright ownership.  The ASF licenses this file
    to you under the Apache License, Version 2.0 (the
    "License"); you may not use this file except in compliance
    with the License.  You may obtain a copy of the License at

      http://www.apache.org/licenses/LICENSE-2.0

    Unless required by applicable law or agreed to in writing,
    software distributed under the License is distributed on an
    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    KIND, either express or implied.  See the License for the
    specific language governing permissions and limitations
    under the License.

-->
<taeDescription xmlns="http://uima.apache.org/resourceSpecifier";>
   <frameworkImplementation>org.apache.uima.java</frameworkImplementation>
   <primitive>true</primitive>
   <annotatorImplementationName>org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator
   </annotatorImplementationName>
   <analysisEngineMetaData>
      <name>UmlsLookupAnnotator</name>
      <description>Lookup Annotator descriptor for Snomed Terms which are in a Rare Word -format Database, Ctakes
      </description>
      <version/>
      <vendor/>

      <configurationParameters>
         <!-- windowAnnotations and exclusionTags were originally for the LookupConsumer, but now apply to the annotator -->
         <configurationParameter>
            <name>windowAnnotations</name>
            <description>Type of window to use for lookup</description>
            <type>String</type>
            <multiValued>false</multiValued>
            <mandatory>true</mandatory>
         </configurationParameter>
         <configurationParameter>
            <name>exclusionTags</name>
            <description>Parts of speech to ignore when considering lookup tokens</description>
            <type>String</type>
            <multiValued>false</multiValued>
            <mandatory>false</mandatory>
         </configurationParameter>
         <configurationParameter>
            <name>minimumSpan</name>
            <description>Minimum required span length of tokens to use for lookup. Default is 3</description>
            <type>String</type>
            <multiValued>false</multiValued>
            <mandatory>false</mandatory>
         </configurationParameter>
      </configurationParameters>

      <configurationParameterSettings>
         <nameValuePair>
            <name>windowAnnotations</name>
            <value>
               <!--  LookupWindowAnnotation is supposed to be a refined Noun Phrase  -->
               <string>org.apache.ctakes.typesystem.type.textspan.LookupWindowAnnotation</string>
               <!--  In some instances LookupWindowAnnotation is missing tokens and Sentence can be used -->
               <!--<string>org.apache.ctakes.typesystem.type.textspan.Sentence</string> -->
            </value>
         </nameValuePair>
         <nameValuePair>
            <name>exclusionTags</name>
            <value>
               <string>VB,VBD,VBG,VBN,VBP,VBZ,CC,CD,DT,EX,IN,LS,MD,PDT,POS,PP,PP$,PRP,PRP$,RP,TO,WDT,WP,WPS,WRB</string>
            </value>
         </nameValuePair>
         <nameValuePair>
            <name>minimumSpan</name>
            <value>
<!--               <string>3</string> -->
               <string>2</string>
            </value>
         </nameValuePair>
      </configurationParameterSettings>

      <typeSystemDescription>
         <imports>
         </imports>
      </typeSystemDescription>
      <typePriorities/>
      <fsIndexCollection/>
      <capabilities>
         <capability>
            <inputs>
               <type allAnnotatorFeatures="true">org.apache.ctakes.typesystem.type.syntax.BaseToken</type>
               <type allAnnotatorFeatures="true">org.apache.ctakes.typesystem.type.textspan.LookupWindowAnnotation</type>
               <!--<type allAnnotatorFeatures="true">org.apache.ctakes.typesystem.type.textspan.Sentence</type>-->
            </inputs>
            <outputs>
               <type allAnnotatorFeatures="true">org.apache.ctakes.typesystem.type.textsem.IdentifiedAnnotation</type>
            </outputs>
            <languagesSupported/>
         </capability>
      </capabilities>
      <operationalProperties>
         <modifiesCas>true</modifiesCas>
         <multipleDeploymentAllowed>true</multipleDeploymentAllowed>
         <outputsNewCASes>false</outputsNewCASes>
      </operationalProperties>
   </analysisEngineMetaData>

   <externalResourceDependencies>
      <!-- DictionaryDescriptor is a relatively poorly-named xml that contains parms for dictionary files, dbs, etc. -->
      <!-- why aren't such things just defined here?  The obvious answer is -->
      <externalResourceDependency>
         <key>DictionaryDescriptor</key>
         <description/>
         <interfaceName>org.apache.ctakes.core.resource.FileResource</interfaceName>
         <optional>false</optional>
      </externalResourceDependency>
   </externalResourceDependencies>

   <resourceManagerConfiguration>
      <externalResources>
         <externalResource>
            <!-- The Binding is below, for DictionaryDescriptor = DictionaryDescriptorFile -->
            <name>DictionaryDescriptorFile</name>
            <description/>
            <fileResourceSpecifier>
               <fileUrl>file:org/apache/ctakes/dictionary/lookup/fast/cTakesHsql.xml</fileUrl>
            </fileResourceSpecifier>
            <implementationName>org.apache.ctakes.core.resource.FileResourceImpl</implementationName>
         </externalResource>
      </externalResources>

      <externalResourceBindings>
         <externalResourceBinding>
            <key>DictionaryDescriptor</key>
            <resourceName>DictionaryDescriptorFile</resourceName>
         </externalResourceBinding>
      </externalResourceBindings>
   </resourceManagerConfiguration>
</taeDescription>

<?xml version="1.0" encoding="UTF-8"?>
<!--

    Licensed to the Apache Software Foundation (ASF) under one
    or more contributor license agreements.  See the NOTICE file
    distributed with this work for additional information
    regarding copyright ownership.  The ASF licenses this file
    to you under the Apache License, Version 2.0 (the
    "License"); you may not use this file except in compliance
    with the License.  You may obtain a copy of the License at

      http://www.apache.org/licenses/LICENSE-2.0

    Unless required by applicable law or agreed to in writing,
    software distributed under the License is distributed on an
    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    KIND, either express or implied.  See the License for the
    specific language governing permissions and limitations
    under the License.

-->
<analysisEngineDescription xmlns="http://uima.apache.org/resourceSpecifier";>
  <frameworkImplementation>org.apache.uima.java</frameworkImplementation>
  <primitive>false</primitive>
  <delegateAnalysisEngineSpecifiers>
    <delegateAnalysisEngine key="Chunker">
      <import location="../../../ctakes-chunker/desc/Chunker.xml"/>
    </delegateAnalysisEngine>
    <delegateAnalysisEngine key="TokenizerAnnotator">
      <import location="../../../ctakes-core/desc/analysis_engine/TokenizerAnnotator.xml"/>
    </delegateAnalysisEngine>
    <delegateAnalysisEngine key="ContextDependentTokenizerAnnotator">
      <import location="../../../ctakes-context-tokenizer/desc/analysis_engine/ContextDependentTokenizerAnnotator.xml"/>
    </delegateAnalysisEngine>
    <delegateAnalysisEngine key="DictionaryLookupAnnotatorDB">
       <import location="../../../ctakes-dictionary-lookup-fast/desc/analysis_engine/UmlsLookupAnnotator.xml"/>
    </delegateAnalysisEngine>
    <delegateAnalysisEngine key="StatusAnnotator">
      <import location="../../../ctakes-ne-contexts/desc/StatusAnnotator.xml"/>
    </delegateAnalysisEngine>
    <delegateAnalysisEngine key="NegationAnnotator">
      <import location="../../../ctakes-ne-contexts/desc/NegationAnnotator.xml"/>
    </delegateAnalysisEngine>
    <delegateAnalysisEngine key="ExtractionPrepAnnotator">
      <import location="ExtractionPrepAnnotator.xml"/>
    </delegateAnalysisEngine>
    <delegateAnalysisEngine key="SentenceDetectorAnnotator">
      <import location="../../../ctakes-core/desc/analysis_engine/SentenceDetectorAnnotator.xml"/>
    </delegateAnalysisEngine>
     <!-- By default, the dictionary lookup window is Sentence.
          The change was made in 3.2.1 because experiments showed that many terms were missed when relying upon the
          accuracy of LookupWindowAnnotator to correctly identify all present full noun phrases.
          Instead, reliance is now upon the fact that most terms in the dictionary itself are (or fit in) noun phrases.
     To revert to LookupWindowAnnotation:
       1.  uncomment the following lines to load the LookupWindowAnnotator,
       2.  uncomment the LookupWindowAnnotator line in <fixedFlow>,
       3.  uncomment the LookupWindowAnnotation line in <capability> <outputs> <type>
       4.  in ctakes-dictionary-lookup-fast .. /desc/analysis_engine/UmlsLookupAnnotator.xml
       switch the value for <nameValuePair> windowAnnotations.
       LookupWindowAnnotation is still there, just commented
       5.  also uncomment <capability> <inputs> <type> ... LookupWindowAnnotation in UmlsLookupAnnotator.xml
       The AdjustNounPhrase*** annotators have been left in case another module needs them.
       I leave it to somebody with more applicable knowledge to remove them from the flow.
       -->
    <delegateAnalysisEngine key="LookupWindowAnnotator">
      <import location="LookupWindowAnnotator.xml"/>
    </delegateAnalysisEngine>
    <delegateAnalysisEngine key="AdjustNounPhraseToIncludeFollowingNP">
      <import location="../../../ctakes-chunker/desc/AdjustNounPhraseToIncludeFollowingNP.xml"/>
    </delegateAnalysisEngine>
    <delegateAnalysisEngine key="AdjustNounPhraseToIncludeFollowingPPNP">
      <import location="../../../ctakes-chunker/desc/AdjustNounPhraseToIncludeFollowingPPNP.xml"/>
    </delegateAnalysisEngine>
    <delegateAnalysisEngine key="SimpleSegmentAnnotator">
      <import location="SimpleSegmentAnnotator.xml"/>
    </delegateAnalysisEngine>
    <delegateAnalysisEngine key="POSTagger">
      <import location="../../../ctakes-pos-tagger/desc/POSTagger.xml"/>
    </delegateAnalysisEngine>
	<!-- 
	<delegateAnalysisEngine key="ClearPOSTagger">
	<import location="../../../ctakes-pos-tagger/desc/ClearNLPPOSTagger.xml"/>
	</delegateAnalysisEngine>
	 -->    
    <delegateAnalysisEngine key="LvgAnnotator">
      <import location="../../../ctakes-lvg/desc/analysis_engine/LvgAnnotator.xml"/>
    </delegateAnalysisEngine>
<!--     
    <delegateAnalysisEngine key="AssertionAnnotator">
      <import location="../../../ctakes-assertion/desc/AssertionMiniPipelineAnalysisEngine.xml"/>
    </delegateAnalysisEngine>
 -->
     <delegateAnalysisEngine key="GenericCleartkAnalysisEngine">
      <import location="../../../ctakes-assertion/desc/analysis_engine/GenericCleartkAnalysisEngine.xml"/>
     </delegateAnalysisEngine>
     
     <delegateAnalysisEngine key="HistoryCleartkAnalysisEngine">
      <import location="../../../ctakes-assertion/desc/analysis_engine/HistoryCleartkAnalysisEngine.xml"/>
     </delegateAnalysisEngine>
     <delegateAnalysisEngine key="PolarityCleartkAnalysisEngine">
      <import location="../../../ctakes-assertion/desc/analysis_engine/PolarityCleartkAnalysisEngine.xml"/>
     </delegateAnalysisEngine>
     <delegateAnalysisEngine key="SubjectCleartkAnalysisEngine">
      <import location="../../../ctakes-assertion/desc/analysis_engine/SubjectCleartkAnalysisEngine.xml"/>
     </delegateAnalysisEngine>
     <delegateAnalysisEngine key="UncertaintyCleartkAnalysisEngine">
      <import location="../../../ctakes-assertion/desc/analysis_engine/UncertaintyCleartkAnalysisEngine.xml"/>
     </delegateAnalysisEngine>
     
    <delegateAnalysisEngine key="DependencyParser">
      <import location="../../../ctakes-dependency-parser/desc/analysis_engine/ClearNLPDependencyParserAE.xml"/>
    </delegateAnalysisEngine>
<delegateAnalysisEngine key="SemanticRoleLabeler">
<import location="../../../ctakes-dependency-parser/desc/analysis_engine/ClearNLPSemanticRoleLabelerAE.xml"/>
</delegateAnalysisEngine>    

    <delegateAnalysisEngine key="ConstituencyParser">
      <import location="../../../ctakes-constituency-parser/desc/ConstituencyParserAnnotator.xml"/>
    </delegateAnalysisEngine>
    
  </delegateAnalysisEngineSpecifiers>
  <analysisEngineMetaData>
    <name>AggregatePlaintextUMLSProcessor</name>
    <description>Runs the complete pipeline for annotating clinical documents in plain text format using the built in UMLS (SNOMEDCT and RxNORM) dictionaries.  This uses the dictionary lookup/desc/DictionaryLookupAnnotatorUMLS.xml
and requires an UMLS license.  Please update DictionaryLookupAnnotatorUMLS.xml file with your UMLS username and password.
</description>
    <version/>
    <vendor/>
    <configurationParameters searchStrategy="language_fallback">
      <configurationParameter>
        <name>SegmentID</name>
        <description/>
        <type>String</type>
        <multiValued>false</multiValued>
        <mandatory>false</mandatory>
        <overrides>
          <parameter>SimpleSegmentAnnotator/SegmentID</parameter>
        </overrides>
      </configurationParameter>
      <configurationParameter>
        <name>ChunkCreatorClass</name>
        <type>String</type>
        <multiValued>false</multiValued>
        <mandatory>true</mandatory>
        <overrides>
          <parameter>Chunker/ChunkCreatorClass</parameter>
        </overrides>
      </configurationParameter>
    </configurationParameters>
    <configurationParameterSettings>
      <nameValuePair>
        <name>ChunkCreatorClass</name>
        <value>
          <string>org.apache.ctakes.chunker.ae.PhraseTypeChunkCreator</string>
        </value>
      </nameValuePair>
    </configurationParameterSettings>
    <flowConstraints>
      <fixedFlow>
        <node>SimpleSegmentAnnotator</node>
        <node>SentenceDetectorAnnotator</node>
        <node>TokenizerAnnotator</node>
        <node>LvgAnnotator</node>
        <node>ContextDependentTokenizerAnnotator</node>
        <node>POSTagger</node>
		<!-- <node>ClearPOSTagger</node>  -->        
        <node>Chunker</node>
        <node>AdjustNounPhraseToIncludeFollowingNP</node>
        <node>AdjustNounPhraseToIncludeFollowingPPNP</node>
        <node>LookupWindowAnnotator</node>
        <node>DictionaryLookupAnnotatorDB</node>
        <node>DependencyParser</node>
		<node>SemanticRoleLabeler</node>        
		<node>ConstituencyParser</node>
        <!-- <node>AssertionAnnotator</node> -->
        <!-- <node>StatusAnnotator</node> -->
       	<!-- <node>NegationAnnotator</node> -->
       	<node>GenericCleartkAnalysisEngine</node>
		<node>HistoryCleartkAnalysisEngine</node>
		<node>PolarityCleartkAnalysisEngine</node>
		<node>SubjectCleartkAnalysisEngine</node>
		<node>UncertaintyCleartkAnalysisEngine</node>
		    
        <node>ExtractionPrepAnnotator</node>
      </fixedFlow>
    </flowConstraints>
    <typePriorities>
      <name>Ordering</name>
      <description>For subiterator</description>
      <version>1.0</version>
      <priorityList>
        <type>org.apache.ctakes.typesystem.type.textspan.Segment</type>
        <type>org.apache.ctakes.typesystem.type.textspan.Sentence</type>
        <type>org.apache.ctakes.typesystem.type.syntax.BaseToken</type>
      </priorityList>
      <priorityList>
        <type>org.apache.ctakes.typesystem.type.textspan.Sentence</type>
        <type>org.apache.ctakes.typesystem.type.textsem.IdentifiedAnnotation</type>
      </priorityList>
    </typePriorities>
    <fsIndexCollection/>
    <capabilities>
      <capability>
        <inputs/>
        <outputs>
          <type allAnnotatorFeatures="true">org.apache.ctakes.typesystem.type.syntax.NewlineToken</type>
          <type allAnnotatorFeatures="true">org.apache.ctakes.typesystem.type.textsem.IdentifiedAnnotation</type>
          <type allAnnotatorFeatures="true">org.apache.ctakes.typesystem.type.syntax.WordToken</type>
          <type allAnnotatorFeatures="true">org.apache.ctakes.typesystem.type.syntax.VP</type>
          <type allAnnotatorFeatures="true">org.apache.ctakes.typesystem.type.refsem.UmlsConcept</type>
          <type allAnnotatorFeatures="true">org.apache.ctakes.typesystem.type.syntax.UCP</type>
          <type allAnnotatorFeatures="true">org.apache.ctakes.typesystem.type.textsem.TimeAnnotation</type>
          <type allAnnotatorFeatures="true">org.apache.ctakes.typesystem.type.syntax.SymbolToken</type>
          <type allAnnotatorFeatures="true">org.apache.ctakes.typesystem.type.textspan.Sentence</type>
          <type allAnnotatorFeatures="true">org.apache.ctakes.typesystem.type.textspanSegment</type>
          <type allAnnotatorFeatures="true">org.apache.ctakes.typesystem.type.syntax.SBAR</type>
          <type allAnnotatorFeatures="true">org.apache.ctakes.typesystem.type.textsem.RomanNumeralAnnotation</type>
          <type allAnnotatorFeatures="true">org.apache.ctakes.typesystem.type.textsem.RangeAnnotation</type>
          <type allAnnotatorFeatures="true">org.apache.ctakes.typesystem.type.syntax.PunctuationToken</type>
          <type allAnnotatorFeatures="true">org.apache.ctakes.typesystem.type.Property</type>
          <type allAnnotatorFeatures="true">org.apache.ctakes.typesystem.type.Properties</type>
          <type allAnnotatorFeatures="true">org.apache.ctakes.typesystem.type.textsem.PersonTitleAnnotation</type>
          <type allAnnotatorFeatures="true">org.apache.ctakes.typesystem.type.syntax.PRT</type>
          <type allAnnotatorFeatures="true">org.apache.ctakes.typesystem.type.syntax.PP</type>
          <type allAnnotatorFeatures="true">org.apache.ctakes.typesystem.type.OntologyConcept</type>
          <type allAnnotatorFeatures="true">org.apache.ctakes.typesystem.type.syntax.NumToken</type>
          <type allAnnotatorFeatures="true">org.apache.ctakes.typesystem.type.syntax.</type>
          <type allAnnotatorFeatures="true">org.apache.ctakes.typesystem.type.textsem.MeasurementAnnotation</type>
          <type allAnnotatorFeatures="true">edu.mayo.bmi.uima.lookup.type.LookupWindowAnnotation</type>
          <type allAnnotatorFeatures="true">org.apache.ctakes.typesystem.type.syntax.Lemma</type>
          <type allAnnotatorFeatures="true">org.apache.ctakes.typesystem.type.syntax.LST</type>
          <type allAnnotatorFeatures="true">org.apache.ctakes.typesystem.type.syntax.INTJ</type>
          <type allAnnotatorFeatures="true">org.apache.ctakes.typesystem.type.textsem.FractionAnnotation</type>
          <type allAnnotatorFeatures="true">org.apache.ctakes.typesystem.type.structured.DocumentID</type>
          <type allAnnotatorFeatures="true">uima.tcas.DocumentAnnotation</type>
          <type allAnnotatorFeatures="true">org.apache.ctakes.typesystem.type.textsem.DateAnnotation</type>
          <type allAnnotatorFeatures="true">org.apache.ctakes.typesystem.type.CopySrcAnnotation</type>
          <type allAnnotatorFeatures="true">org.apache.ctakes.typesystem.type.CopyDestAnnotation</type>
          <type allAnnotatorFeatures="true">org.apache.ctakes.typesystem.type.ContractionToken</type>
          <type allAnnotatorFeatures="true">org.apache.ctakes.typesystem.type.textsem.ContextAnnotation</type>
          <type allAnnotatorFeatures="true">org.apache.ctakes.typesystem.type.syntax.Chunk</type>
          <type allAnnotatorFeatures="true">org.apache.ctakes.typesystem.type.syntax.CONJP</type>
          <type allAnnotatorFeatures="true">org.apache.ctakes.typesystem.type.syntax.BaseToken</type>
          <type allAnnotatorFeatures="true">uima.cas.AnnotationBase</type>
          <type allAnnotatorFeatures="true">uima.tcas.Annotation</type>
          <type allAnnotatorFeatures="true">org.apache.ctakes.typesystem.type.syntax.ADVP</type>
          <type allAnnotatorFeatures="true">org.apache.ctakes.typesystem.type.syntax.ADJP</type>        
        </outputs>
        <languagesSupported/>
      </capability>
    </capabilities>
    <operationalProperties>
      <modifiesCas>true</modifiesCas>
      <multipleDeploymentAllowed>true</multipleDeploymentAllowed>
      <outputsNewCASes>false</outputsNewCASes>
    </operationalProperties>
  </analysisEngineMetaData>
  <resourceManagerConfiguration/>
</analysisEngineDescription>

Re: cTakes Annotation Comparison

Reply via email to