Re: Allergy Annotator

Ks Sunder Thu, 12 Jan 2017 22:11:13 -0800

Thanq Sean,

   I have done coding for this  read the csv file purpose im using java,
but cTakes UML Dictionary purpose I am using below fuction.



 public  AnalysisEngineDescription getUMLPipeline() throws
ResourceInitializationException, URISyntaxException{
   AggregateBuilder builder = new AggregateBuilder();
   builder.add(SimpleSegmentAnnotator.createAnnotatorDescription());
   builder.add(SentenceDetector.createAnnotatorDescription());
   builder.add(TokenizerAnnotatorPTB.createAnnotatorDescription());
   builder.add(POSTagger.createAnnotatorDescription());
   builder.add(ClinicalPipelineFactory.getNpChunkerPipeline());
   builder.add(LvgAnnotator.createAnnotatorDescription());

     try {
         builder.add( AnalysisEngineFactory.createEngineDescription(
DefaultJCasTermAnnotator.class,
              AbstractJCasTermAnnotator.PARAM_WINDOW_ANNOT_PRP,
              "org.apache.ctakes.typesystem.type.textspan.Sentence",
              JCasTermAnnotator.DICTIONARY_DESCRIPTOR_KEY,
              ExternalResourceFactory.createExternalResourceDescription(
                    FileResourceImpl.class,
                    FileLocator.locateFile(
"org/apache/ctakes/dictionary/lookup/fast/cTakesHsql.xml" )
                    )
        ) );
     } catch ( FileNotFoundException e ) {
        e.printStackTrace();
        throw new ResourceInitializationException( e );
     }

   return builder.createAggregateDescription();
 }


and next I am calling this fuction from here......



 reader = new CSVReader(new FileReader(ExelReadJava.NarrativeFile));
 String [] nextLine;
 int lineNumber = 0;


 while ((nextLine = reader.readNext()) != null) {
   lineNumber++;
   System.out.println("Line # " + lineNumber);

    //UML code start
      try {
if(nextLine[4].length()>1 ){

final JCas jcas = JCasFactory.createJCas();
jcas.setDocumentText( nextLine[4] );
SimplePipeline.runPipeline(jcas, pipelineTesting.getUMLPipeline());

for ( IdentifiedAnnotation entity : JCasUtil.select( jcas,
IdentifiedAnnotation.class ) ) {
     if(entity.getOntologyConceptArr() != null){

    add.append(entity.getCoveredText()+ ",");
     }
}


this function working properly , but processing time one line per 40sec,
how can decrease the processing time .

i have 1lakh records(lines) in a csv file.

please give me a solution and example......





regards,
shyam k.

On Thu, Jan 12, 2017 at 8:48 PM, Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:

> Hi Shyam,
>
> Have a look at the LinesFromFileCollectionReader class in ctakes-core.  It
> doesn't use csv files, but instead treats every newline character as a
> separator.
>
> Sean
>
> -----Original Message-----
> From: Ks Sunder [mailto:shyam...@gmail.com]
> Sent: Wednesday, January 11, 2017 1:29 AM
> To: dev@ctakes.apache.org
> Subject: Re: Allergy Annotator
>
> Hi All,
>
> my scenario is, read the string content from csv file, and find out
> medical terms from that content using cTakes UML.
>
> as per your suggestion i try to find CollectionReader in ctakes-core, but
> i didnt get clear solution, please give valuable solution, and one example.
>
>
> regards,
> shyam k.
>
> On Thu, Dec 22, 2016 at 9:16 PM, Finan, Sean <
> sean.fi...@childrens.harvard.edu> wrote:
>
> > Hi Shyam,
> >
> > I think that the key to your first question
> > >   how can execute the single function to run all this jobs in short
> > time...
> > Is in your code here:
> >
> > 1       final JCas jcas = JCasFactory.createJCas();
> > 2       jcas.setDocumentText( nextLine[0] );
> > 3       SimplePipeline.runPipeline(jcas, getUMLPipeline());
> >
> > What you probably want to do is replace lines #1 and #2 with a
> > CollectionReader, and then in #3 use a different SimplePipeline call
> > that runs the pipeline using the CollectionReader instead of a static
> cas.
> >
> > There are commonly used CollectionReaders in ctakes-core.  The most
> > widely applicable is probably the FileTreeReader*, which reads a tree
> > of ascii files.  If you have some other source of text data then look
> > around the code for something that might fit and let the devlist know
> > if you can't find anything that fits your needs.
> >
> > I don't understand your second question:
> > > how can i find sentence vised Dictionary words from string, give me
> > > a
> > solution for this..
> > Can you rephrase it and post to the devlist again?
> >
> > * one advantage that the FileTreeReader has is that it stores metadata
> > on the input file tree placement, which can then be reproduced by
> > output file writers like the html writer.
> >
> > Sean
> >
> >
> > -----Original Message-----
> > From: Ks Sunder [mailto:shyam...@gmail.com]
> > Sent: Thursday, December 22, 2016 2:33 AM
> > To: dev@ctakes.apache.org
> > Subject: Re: Allergy Annotator
> >
> > Hi All,
> >
> > I have done the below code for finding medical terms from String
> > information.
> >
> > step 1 :
> > public static AnalysisEngineDescription getUMLPipeline() throws
> > ResourceInitializationException, URISyntaxException{
> >    AggregateBuilder builder = new AggregateBuilder();
> >    builder.add(SimpleSegmentAnnotator.createAnnotatorDescription());
> >    builder.add(SentenceDetector.createAnnotatorDescription());
> >    builder.add(TokenizerAnnotatorPTB.createAnnotatorDescription());
> >    builder.add(POSTagger.createAnnotatorDescription());
> >    builder.add(ClinicalPipelineFactory.getNpChunkerPipeline());
> >    builder.add(LvgAnnotator.createAnnotatorDescription());
> >
> >      try {
> >          builder.add( AnalysisEngineFactory.createEngineDescription(
> > DefaultJCasTermAnnotator.class,
> >               AbstractJCasTermAnnotator.PARAM_WINDOW_ANNOT_PRP,
> >               "org.apache.ctakes.typesystem.type.textspan.Sentence",
> >               JCasTermAnnotator.DICTIONARY_DESCRIPTOR_KEY,
> >               ExternalResourceFactory.createExternalResourceDescription(
> >                     FileResourceImpl.class,
> >                     FileLocator.locateFile(
> "org/apache/ctakes/dictionary/lookup/fast/cTakesHsql.xml"
> > ) )
> >         ) );
> >      } catch ( FileNotFoundException e ) {
> >         e.printStackTrace();
> >         throw new ResourceInitializationException( e );
> >      }
> >
> >    return builder.createAggregateDescription();
> >  }
> > step 2:
> >
> > final JCas jcas = JCasFactory.createJCas(); jcas.setDocumentText(
> > nextLine[0] ); SimplePipeline.runPipeline(jcas, getUMLPipeline());
> >
> > for ( IdentifiedAnnotation entity : JCasUtil.select( jcas,
> > IdentifiedAnnotation.class ) ) {
> >
> >          if(entity.getOntologyConceptArr() != null){
> >
> >         add.append(entity.getCoveredText()+ ",");
> >
> >          }
> > }
> >
> >
> >
> >
> >
> > its working Fine..
> >
> > But i have two quires..
> >
> > 1. step1 , i am using Annotator step by step ... that time its taking
> > more time load the all fuctions
> >    how can execute the single function to run all this jobs in short
> > time...
> >
> > 2. how can i find sentence vised Dictionary words from string, give me
> > a solution for this..
> >
> >
> > ...please give me a solutions for this issues....
> >
> >
> >
> > regards,
> > shyam k.
> >
> > On Thu, Dec 8, 2016 at 1:59 AM, Mullane, Sean *HS <
> > sp...@hscmail.mcc.virginia.edu> wrote:
> >
> > > I'm reviving this thread with reference to negation detection. I
> > > previously posted about this to the User list but this is probably a
> > > more appropriate venue.
> > >
> > > The way the sentences are split on ":" makes the negation annotator
> > > miss negation in lists of this form:
> > >
> > > Hyperlipidemia:  Yes
> > > Hypercholesterolemia:  No
> > > Chronic Renal Insufficiency:  N/A
> > >
> > > I tried reversing order and removing ":"s and found that the
> > > negation for Hypercholesterolemia is detected when in this form:
> > >
> > > Yes Hyperlipidemia
> > > No Hypercholesterolemia
> > > N/A Chronic Renal Insufficiency
> > >
> > > Our notes have quite a few places with this sort of list where good
> > > negation detection is important but I haven't very good results. The
> > > sentence segmentator sees this as 12 separate sentences, but I would
> > > think proper behavior would be to consider this as 6 sentences
> > > (breaking sentences on line break but not on colons). I see previous
> > > discussion on the list about the sentence segmentator breaking on
> > > newlines but little regarding colons. I would think in most cases it
> > > would be more useful not to break on ":". Or is there an overriding
> > reason for the current behavior?
> > > If changing the sentence segmentator isn't an option is there a
> > > different way to configure the negation detection annotator that
> > > would avoid this issue?
> > >
> > > Thanks,
> > > Sean
> > >
> > >
> > >
> > > Hi,
> > >
> > > I am interested in the design decision of the sentence detector.
> > >
> > > Why does it split a sentence of the form "WORD1: WORD2 WORD3." into
> > > two sentences "WORD1:" and "WORD2 WORD3."? Do other components of
> > > cTAKES require such a sentence splitting?
> > >
> > > It would seem to me that it should remain one sentence. For example,
> > > the smoking status detector has its own SentenceAdjuster that merges
> > > some of such sentences back into one, because of this design.
> > >
> > > Thanks, Tomasz
> > >
> > > ________________________________________ From: Finan, Sean [
> > > sean...@childrens.harvard.edu] Sent: Friday, July 10, 2015 3:20 PM To:
> > > de...@ctakes.apache.org Subject: RE: Allergy Annotator
> > >
> > > Hi Tom,
> > >
> > > It is exactly because the sentence detector splits "KEY:" from "VALUE"
> > > that I
> > > didn't suggest using sentences. Instead, I would just iterate over
> > > the whole cas collection of medication events and attempt to match
> > > allergy phrases ("allergic to medication") with text the note
> > > spanning from
> > > event.begin-15 to
> > > event.end+15 or whatever window size you prefer.
> > >
> > > Sean
> > >
> > > -----Original Message----- From: Tom Devel
> > > [mailto:deve...@gmail.com]
> > > Sent: Friday, July 10, 2015 4:12 PM To: de...@ctakes.apache.org
> Subject:
> > > Re: Allergy Annotator
> > >
> > > Sean and Dima, these are great suggestions, thanks so far.
> > >
> > > Sean, when looping over medication events as you say, I can see how
> > > it is possible to take the textspan.Sentence of this
> > > MedicationMention, and then do a regex check for the phrase structure
> as Dima said.
> > >
> > > But instead of textspan.Sentence, you mention "see any is included
> > > in a phrase".
> > > What cTAKES/UIMA class is related to this?
> > >
> > > Because if I would use textspan.Sentence, it would work for "The
> > > patient is allergic to penicillin.", but cTAKES splits "ALLERGIES:
> > PENICILLIN, WHEAT"
> > > into two sentences, so that the MedicationMentions here would not be
> > > in the same sentence as the word "ALLERGIES".
> > >
> > > Thanks again, Tom
> > >
> > > On Fri, Jul 10, 2015 at 2:12 PM, Finan, Sean <
> > > sean...@childrens.harvard.edu>
> > > wrote:
> > >
> > > Hi Dima, Tom,
> > >
> > > I was thinking the same as Dima's first solution. Iterate through
> > > the medication events and see any is included in a phrase as
> > > mentioned in Tom's original email. Each phrase structure would have
> > > to be specified beforehand. However, assigning appropriate CUIs
> > > would require having a lookup table for each medication allergy. I
> > > think that would be the simplest solution.
> > >
> > > Sean
> > >
> > > -----Original Message----- From: Dligach, Dmitriy [mailto:
> > > dmit...@childrens.harvard.edu] Sent: Friday, July 10, 2015 2:50 PM To:
> > > cTAKES Developer list Subject: Re: Allergy Annotator
> > >
> > > Hi Tom,
> > >
> > > If the patters are pretty simple, you could just add a few rules on
> > > top of the cTAKES dictionary lookup output. Something of the kind
> > > "allergic to <medication>" or "allergies: <medication1>,
> > > <medication2>, <substance1>, ...".
> > >
> > > If these patterns are hard to express as rules, you should consider
> > > a machine learning based sequence labeling route (e.g. something
> > > similar to the cTAKES chunker).
> > >
> > > Dima
> > >
> > > -- Dmitriy (Dima) Dligach, Ph.D. Boston Children's Hospital and
> > > Harvard Medical School (617) 651-0397
> > >
> > > On Jul 10, 2015, at 13:40, Tom Devel <deve...@gmail.com<mailto:
> > > deve...@gmail.com>> wrote:
> > >
> > > Sean,
> > >
> > > It would be a wider net, such that if an allergy is mentioned in the
> > > clinical note, this is captured in the corresponding
> > > IdentifiedAnnotation (or alternatively, if the IdentifiedAnnotation
> > > class should not be changed with a new attribute, in a separate
> > > allergy annotation).
> > >
> > > This annotator would then have to of course run after the clinical
> > > pipeline has run and discovered all IdentifiedAnnotations.
> > >
> > > I am familiar with writing UIMA/cTAKES annotators, but not sure how
> > > a new ML method could be integrated here for detecting allergies. Do
> > > you have any thoughts about how to approach this in general?
> > >
> > > Thanks, Tom
> > >
> > > On Fri, Jul 10, 2015 at 11:54 AM, Finan, Sean <
> > > sean...@childrens.harvard.edu<mailto:Sean.Finan@childrens.harvard.e
> > > du>>
> > > wrote:
> > >
> > > Hi Tom,
> > >
> > > Are you interested in catching all allergies or just a few specific
> > > allergies for a study? If you are only concerned with a few then
> > > there is a
> > > (possibly) simple solution. If you are interested in throwing a
> > > wider net then I think that a new module would need to be created;
> > > does anybody reading this have an ML or regex style module?
> > >
> > > Sean
> > >
> > > -----Original Message----- From: Tom Devel
> > > [mailto:deve...@gmail.com]
> > > Sent: Friday, July 10, 2015 12:42 PM To: de...@ctakes.apache.org<
> mailto:
> > > de...@ctakes.apache.org> Subject: Allergy Annotator
> > >
> > > Hi,
> > >
> > > I would like to use/extend cTAKES to detect allergies.
> > >
> > > In the cTAKES publication (2010)
> > >
> > > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ncbi.nlm.nih
> > > .g
> > > ov_pmc_articles_PMC2995668_&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14J
> > > ZM
> > > SdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=ZApJmGK
> > > jz
> > > vFfNco5rRFVwSIyxmg4MRsxakfuXHbMZME&s=mGWu0XBCJqG2MI5qPlwIpGbQL5IYe7t
> > > 5E WcvhPYW7Lo&e= there is the mention that: "Allergies to a given
> > > medication are handled by setting the negation attribute of that
> > > medication to 'is negated'."
> > >
> > > However, in a post here in 2014 (RE: Allergy Indication) it is said
> > > that cTAKES does not have a module for allergy discovery.
> > >
> > > 1. What is the current status of allergy detection in cTAKES?
> > >
> > > 2. I did some testing, while cTAKES discovers concepts about
> > > allegies ("wheat allergy" is found as C0949570), using "ALLERGIES:
> > > PENICILLIN, WHEAT" or "The patient is allergic to penicillin." does
> > > not give penicillin or wheat annotations allergy status.
> > >
> > > How would I go about detecting these allergy mentions?
> > >
> > > Thanks, Tom
> > >
> > >
> >
>

Re: Allergy Annotator

Reply via email to