Hi Ryan, That list didn't work for me as email added its own line endings, splitting intended lines, and it ends up being 120 rows. For instance:
Esophagogastroduodenoscopy Endoscopic retrograde cholangiopancreatography with Ampullectomy is 2 lines instead of 1. I will write something new for you in a little bit and maybe we can figure this out. Sean ________________________________________ From: Ryan Young <royo...@buffalo.edu> Sent: Tuesday, March 31, 2020 10:13 PM To: dev@ctakes.apache.org Subject: Re: Configure Fast Lookup Dictionary To Return Only 1 UMLS Code (CUI) [EXTERNAL] * External Email - Caution * Hello Sean, I was able to get cTAKES packaged. However, the output text file isn't the same number of lines as the input text file. For example, if the input text file is 10,000 lines long then the output text file ends up being 10,630 lines. This makes me think that there's another conditional statement (or two) which needs to be added to the end of SentenceFirstCuiWriter.java. Here's the current version of SentenceFirstCuiWriter.java I am using: public class SentenceFirstCuiWriter extends AbstractJCasFileWriter { public void writeFile( final JCas jCas, final String outputDir, final String documentId, final String fileName ) throws IOException { File cuiFile = new File( outputDir, fileName + "_cui.txt" ); Map<Sentence, Collection<ProcedureMention>> sentenceMap = JCasUtil.indexCovered( jCas, Sentence.class, ProcedureMention.class ); List<Collection<ProcedureMention>> sortedSentenceProcedures = sentenceMap.entrySet() .stream() .sorted( Map.Entry.comparingByKey( DefaultAspanComparator.INSTANCE ) ) .map( Map.Entry::getValue ) .collect( Collectors.toList() ); try ( Writer writer = new BufferedWriter( new FileWriter( cuiFile ) ) ) { for ( Collection<ProcedureMention> procedures : sortedSentenceProcedures ) { ProcedureMention firstProcedure = procedures.stream() .min( Comparator.comparingInt( ProcedureMention::getBegin ) ) .orElse( null ); if ( firstProcedure == null ) { writer.write( "\n" ); } else { String cui = OntologyConceptUtil.getCuis( firstProcedure ) .stream() .findFirst() .orElse( "" ); if ( cui.isEmpty() ) { writer.write( "\n" ); } else { writer.write( cui + "\n" ); } } } } } } Below is the piper file I am using: // Piper reader org.apache.ctakes.core.cr.FileTreeReader InputDirectory="C:\path\to\input\folder" set ctakes.umlsuser=username ctakes.umlspw=password load DefaultTokenizerPipeline add POSTagger load DictionarySubPipe add SentenceFirstCuiWriter OutputDirectory="C:\path\to\output\folder" If it helps, I have listed the first 100 lines of the input text file. Again, the expected output text file should be 100 lines (i.e., 100 CUIs) as well. However, the output text file returns 103 lines (103 CUIs). 3 extra CUI than what it should. Input.txt Colonoscopy with Polypectomy Esophagogastroduodenoscopy Colonoscopy Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle aspiration Esophagogastroduodenoscopy with Endoscopic ultrasound Esophagogastroduodenoscopy with Biopsy() Linear Endobronchial Ultrasound (EBUS) with Nav Bronch Bronchoscopy Endobronchial Ultrasound (EBUS) OR Esophagogastroduodenoscopy with Dilation Savary Esophagogastroduodenoscopy Esophagogastroduodenoscopy Endoscopic retrograde cholangiopancreatography with Ampullectomy Wide Local Excision Flap Local Cheek Skin Graft Full Thickness (FTSG) Esophagogastroduodenoscopy with Dilation Balloon Esophagogastroduodenoscopy with Biopsy() Excision Soft Tissue Tumor Axillary Node Dissection Wide Local Excision w Removal of Radioactive Seed Laparoscopic Partial Gastrectomy ZLumpectomy, with Sentinel lymph node Biopsy Sentinel Lymph Node Biopsy Excision Cysto with Pre-Op Ureteral Catheter Placement Diagnostic Laparoscopy Sigmoid Colectomy Salpingo Oophorectomy Laminectomy Cervical with Instrumentation Transanal Endoscopic Microsurgery Implantation Procedure Ommaya Reservoir Insertion with Axiem Suprahyoid Lymphadenectomy Procedure Transcervical Extended Mediastinal Lymphadenectomy (Transcervcl Extndd Medstnl Lymphadenectmy) Video Assisted Thorascopic Surgery with Lobectomy Intubating Bronchoscopy Nerve Block Intercostal, Multiple Video Assisted Thorascopic Surgery with Wedge Resection Nerve Block Intercostal, Multiple Video Assisted Thorascopic Surgery with Decortication Esophagogastroduodenoscopy Colonoscopy Esophagogastroduodenoscopy with Endoscopic ultrasound Colonoscopy with Biopsy() Esophagogastroduodenoscopy with Endoscopic ultrasound Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle aspiration Endoscopic retrograde cholangiopancreatography with Ampullectomy Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle aspiration Esophagogastroduodenoscopy Colonoscopy Colonoscopy Esophagogastroduodenoscopy with Biopsy() Esophagogastroduodenoscopy with Esophageal Stent Placement Colonoscopy with Biopsy() Esophagogastroduodenoscopy with Biopsy() Esophagogastroduodenoscopy Colonoscopy Colonoscopy with Biopsy() Wide local excision with Removal of Seeds Sentinel lymph node Biopsys Wide Local Excision w Removal of Radioactive Seed Abscess Drainage Empyema Rib Resection Flap Latissimus Dorsi Thoracoplasty Removal of Foreign Body Laparoscopic Cholecystectomy Laparoscopic Liver Biopsy Laparotomy Salpingo Oophorectomy Resection Pelvic Abcess Ruptured Diverticulum Minimally Invasive Esophagectomy with Feeding J Ileostomy Diagnostic Hysteroscopy Dilation Curettage (D and C) Exploratory Laparotomy Lysis of Adhesions Bowel Resection End to End Anastomosis Take Down of Ostomy Robot Assisted Sigmoid Colon Resection Robot Assisted Right Colectomy Esophagogastroduodenoscopy with Dilation Balloon Segmentectomy (Thoracic) Nerve Block Intercostal, Multiple Video Assisted Thorascopic Surgery with Wedge Resection Intubating Bronchoscopy Nerve Block Intercostal, Multiple Esophagogastroduodenoscopy Colonoscopy Esophagogastroduodenoscopy Endoscopy Mucosal Resection Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle aspiration Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle aspiration Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle aspiration Endoscopic retrograde cholangiopancreatography with Stent Change Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle aspiration Endoscopic retrograde cholangiopancreatography with Ampullectomy Esophagogastroduodenoscopy with Esophageal Stent Placement Colonoscopy Esophagogastroduodenoscopy with Biopsy() Colonoscopy with Biopsy() Esophagogastroduodenoscopy Esophagogastroduodenoscopy with Biopsy() Craniectomy Frontal withStealth Drainage of Abscess Craniotomy Excision Tumor Posterior Fossa Craniotomy Temporal Thyroid Lobectomy with Isthmusectomy Neck Exploration Mediastinal Exploration Video Assisted Thorascopic Surgery with Bullectomy Thyroid Lobectomy Cytoreduction Debulking Hyperthermic Intraperitoneal Chemotherapy (HIPEC) Dilation Curettage (D and C) with Hysteroscopy Open Biopsy Lymph Node Biopsy Excision Total abdominal hysterectomy Bilateral salpingo-oophorectomy with Radical Dissection For Debulking Exploratory Laparotomy Peritoneal Stripping Resection of Tumor Resection of Tumor Bowel Resection Omentectomy Laminectomy Lumbar Craniotomy Occipital with Axiem Cytoreduction Debulking Hyperthermic Intraperitoneal Chemotherapy (HIPEC) Exploratory Laparotomy Colectomy Partial Omentectomy Cystoscopy with Ureteral Stent Insertion Change Cystoscopy with Ureteral Cath Retrograde Pyelogramm Cystoscopy with TURP Retrograde Pyelogram Ureteroscopy Cystoscopy with Ureteral Cathm Colonoscopy with Polypectomy Esophagogastroduodenoscopy with Dilation Balloon Esophagogastroduodenoscopy with RFA (Halo) Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle aspiration Esophagogastroduodenoscopy with Endoscopic ultrasound Esophagogastroduodenoscopy with Endoscopic ultrasound Linear Endobronchial Ultrasound (EBUS) with Nav Bronch Linear Endobronchial Ultrasound (EBUS) with Nav Bronch Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle aspiration Endoscopic retrograde cholangiopancreatography with Ampullectomy Bronchoscopy with Endobronchial Ultrasound (EBUS) Microsuspension Laryngoscopy Selective Neck Dissection Wide Local Excision w Removal of Radioactive Seed Breast Re-Excision Bronchoscopy Robot Assisted Total Hysterectomy SO Craniotomy Parietal with ioMRI Bronchoscopy with Biopsy() Ex Laparotomy Total abdominal hysterectomy with Salpingo Oophorectomy Robot Assisted Prostatectomy Robot Assisted Pelvic Lymphadenectomy Thank You, Ryan Young On Tue, Mar 31, 2020 at 10:44 AM Finan, Sean < sean.fi...@childrens.harvard.edu> wrote: > Hi Ryan, > > You made some excellent progress. ctakes is a little complicated for new > users - especially anybody that isn't familiar with Java. > > Since you are going to be running from a command line (via python) and > have already done so successfully, we can just try to get you set up to > repeat that process. > > In Eclipse, you should be able to run the maven "package" configuration. > > That will compile and build an installation similar to what you were using > before. > > After you execute maven package, > open the directory ctakes-distribution/target/ > There should be a .zip file named apache-ctakes-4.0.1-SNAPSHOT-bin > That zip file contains a ctakes installation for Windows. > Unzip the installation wherever you like - preferably without spaces in > directory names. > > You should be able to treat this new installation just like you did the > one downloaded from the ctakes website. > > Before you do all of that ... We should change a couple of things in that > SentenceFirstCuiWriter to output blanks where procedures or cuis are not > discovered for your snippets. > > > >> public class SentenceFirstCuiWriter extends AbstractJCasFileWriter { > >> > >> public void writeFile( final JCas jCas, final String outputDir, > >> final String documentId, final String fileName > >> ) throws IOException { > >> File cuiFile = new File( outputDir, fileName + "_cui.txt" ); > >> Map<Sentence, Collection<ProcedureMention>> sentenceMap > >> = JCasUtil.indexCovered( jCas, Sentence.class, > >> ProcedureMention.class ); > >> List<Collection<ProcedureMention>> sortedSentenceProcedures > >> = sentenceMap.entrySet() > >> .stream() > >> .sorted( Map.Entry.comparingByKey( > >> DefaultAspanComparator.INSTANCE ) ) > >> .map( Map.Entry::getValue ) > >> .collect( Collectors.toList() ); > >> try ( Writer writer = new BufferedWriter( new FileWriter( cuiFile > ) > >> ) ) { > >> for ( Collection<ProcedureMention> procedures : > >> sortedSentenceProcedures ) { > >> ProcedureMention firstProcedure > >> = procedures.stream() > >> .min( Comparator.comparingInt( > >> ProcedureMention::getBegin ) ) > >> .orElse( null ); > >> if ( firstProcedure != null ) { > > ---------- Change the above line to > > if ( firstProcedure == null ) { > writer.write( "\n" ); > } else { > > >> String cui > >> = OntologyConceptUtil.getCuis( firstProcedure ) > >> .stream() > >> .findFirst() > >> .orElse( "" ); > >> if ( !cui.isEmpty() ) { > > --------- Change the above line to > > if ( cuis.isEmpty() ) { > writer.write( "\n" ); > } else { > > >> writer.write( cui + "\n" ); > >> } > >> } > >> } > >> } > >> } > >> } > > > So, after > 1. Editing the SentenceFirstCuiWriter > 2. Running the maven package step > 3. Unzipping your ctakes installation > > You should be able to > 1. Run ctakes from command line like you did before > 2. Use the custom piper file > 3. Resolve the firstly-discovered procedure for a snippet on each line > 4. Write file(s) with corresponding line-by-line cuis or empty lines > where none are resolved > > Let me know if I missed anything. > > Sean > > ________________________________________ > From: Ryan Young <royo...@buffalo.edu> > Sent: Monday, March 30, 2020 9:44 PM > To: dev@ctakes.apache.org > Subject: Re: Configure Fast Lookup Dictionary To Return Only 1 UMLS Code > (CUI) [EXTERNAL] > > * External Email - Caution * > > > Hello Sean, > > I have run into some difficulty actually running the script you wrote > (SentenceFirstCuiWriter.java). I spent the last week doing the following: > 1.) Installed cTAKES developer version using Eclipse IDE > 2.) Added the appropriate import statements at the beginning of > SentenceFirstCuiWriter.java > 3.) Placed SentenceFirstCuiWriter.java in this directory: > > C:\Users\Ryan\eclipse-workspace\ctakes\ctakes-core\src\main\java\org\apache\ctakes\core\cc > 4.) Successfully built and compiled cTAKES developer version > 5.) Successfully run the test configurations which were already in cTAKES > in Eclipse (Run --> Run As --> Maven test) > > My main question is how do I run the cTAKES developer version from command > line without running Eclipse or Maven? > > I found a post you made last year ( > > https://urldefense.proofpoint.com/v2/url?u=http-3A__mail-2Darchives.apache.org_mod-5Fmbox_ctakes-2Ddev_201907.mbox_-253C1563805239741.31947-2540childrens.harvard.edu-253E&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=ilUJmT8axx_RhXR_47XCxeR_aqswpoVXkSF5HQAxASQ&s=dxIE3QRB6OI1CxljCVx7K9Lgih-ymSq-wou0LqCvkvk&e= > ). > You stated, *"You can put PipelineBuilder in any main(..) method and then > start that main(..) from a command line just as you would any other java > program. Just like any other java program, you need to have your > $CLASSPATH set correctly and, for memory use, increase your maximum memory > with -Xmx . These are VM options."* > > I think this is what I have to do. But, I am unsure of how to accomplish > this exactly. What I have tried already is: > 1.) Launch Command Prompt > 2.) Change directory to where PipelineBuilder.java is located > cd > > C:\Users\Ryan\eclipse-workspace\ctakes\ctakes-core\src\main\java\org\apache\ctakes\core\pipeline\PipelineBuilder.java > 3.) Enter the following into Command Prompt > java org.apache.ctakes.core.pipeline.PiperFileRunner -p > C:\Users\Ryan\SkyDrive\Desktop\Piper_File.piper -i > C:\Users\Ryan\SkyDrive\Desktop\Input_Folder --writeXmis > C:\Users\Ryan\SkyDrive\Desktop\Output_Folder > > I receive the following error in Command Prompt: > Error: Could not find or load main class > org.apache.ctakes.core.pipeline.PiperFileRunner > > I am probably missing something. Just not sure what exactly. I'm not too > familiar with Java. The documentation I have been reading hasn't been as > helpful since cTAKES is a much more complex project than the simple > examples they provide. > > Lastly, I am using Windows 10. > > Thank You, > > Ryan Young > MD/MBA Candidate > Jacobs School of Medicine & Biomedical Sciences > > On Mon, Mar 23, 2020 at 3:28 PM Ryan Young <royo...@buffalo.edu> wrote: > > > Hello Sean, > > > > Wow! This was a lot more than I was anticipating! Thank you very much! > > > > To answer your questions... > > • I am using Windows 10 > > • I have the Python script call a shell command to run a batch file. The > > batch file just contains the following line: > > "C:\cTAKES_4.0.0\bin\runPiperFile.bat" -p "C:\path\to\piper.piper" > > • The Python script waits for the shell command to complete (i.e., when > > cTAKES is finished processing) > > • The Python script will then parse the output text files and then > > continue on with the code > > > > Prior to calling cTAKES, the surgery list is in a Pandas dataframe. The > > workaround I had created was to save each line of the surgery list column > > in the dataframe to a different text file to make it easier for when I > had > > to parse the output cTAKES text file. As I had mentioned previously, I > > would like to have just 1 input text file and 1 output text file (as long > > as the output file can be easily parsed by Python). > > > > Regarding my coding background, I don't have much background in Java. > > However, a few years ago, I had no knowledge of Python either, but I was > > able to teach myself while in medical school. > > > > A few more questions for you... > > 1.) Should I save the code you posted in the following location as a .jar > > file? > > C:\cTAKES_4.0.0\lib\SentenceFirstCuiWriter.jar > > > > 2.) Should I replace "add CuiLookupLister" with "add > > SentenceFirstCuiWriter" in the piper file or do I need both? > > > > 3.) If the SentenceFirstCuiWriter is unable to find a valid CUI, will it > > leave a blank, N/A, or NaN value? Having any of these values would > > definitely help when I have Python parse the output text file. When I > have > > Python read the output text file, I would have it delete any dataframe > rows > > with NaN or N/A in the CUI column. > > > > Thank you very much for your assistance! > > > > Ryan Young > > MD/MBA Candidate > > Jacobs School of Medicine & Biomedical Sciences > > > > On Mon, Mar 23, 2020 at 1:01 PM Finan, Sean < > > sean.fi...@childrens.harvard.edu> wrote: > > > >> Hi Ryan, > >> > >> Here is some code for a writer that will do what you want. > >> To use it, get rid of those first two lines in the piper that I sent > >> (set, reader). > >> The default reader will work just fine, and it will allow you to process > >> multiple surgery lists in on run. > >> > >> Then just add SentenceFirstCuiWriter to the end of your piper. > >> > >> Sean > >> > >> > >> public class SentenceFirstCuiWriter extends AbstractJCasFileWriter { > >> > >> public void writeFile( final JCas jCas, final String outputDir, > >> final String documentId, final String fileName > >> ) throws IOException { > >> File cuiFile = new File( outputDir, fileName + "_cui.txt" ); > >> Map<Sentence, Collection<ProcedureMention>> sentenceMap > >> = JCasUtil.indexCovered( jCas, Sentence.class, > >> ProcedureMention.class ); > >> List<Collection<ProcedureMention>> sortedSentenceProcedures > >> = sentenceMap.entrySet() > >> .stream() > >> .sorted( Map.Entry.comparingByKey( > >> DefaultAspanComparator.INSTANCE ) ) > >> .map( Map.Entry::getValue ) > >> .collect( Collectors.toList() ); > >> try ( Writer writer = new BufferedWriter( new FileWriter( cuiFile > ) > >> ) ) { > >> for ( Collection<ProcedureMention> procedures : > >> sortedSentenceProcedures ) { > >> ProcedureMention firstProcedure > >> = procedures.stream() > >> .min( Comparator.comparingInt( > >> ProcedureMention::getBegin ) ) > >> .orElse( null ); > >> if ( firstProcedure != null ) { > >> String cui > >> = OntologyConceptUtil.getCuis( firstProcedure ) > >> .stream() > >> .findFirst() > >> .orElse( "" ); > >> if ( !cui.isEmpty() ) { > >> writer.write( cui + "\n" ); > >> } > >> } > >> } > >> } > >> } > >> } > >> > >> ________________________________________ > >> From: Ryan Young <royo...@buffalo.edu> > >> Sent: Monday, March 23, 2020 11:02 AM > >> To: dev@ctakes.apache.org > >> Subject: Configure Fast Lookup Dictionary To Return Only 1 UMLS Code > >> (CUI) [EXTERNAL] > >> > >> * External Email - Caution * > >> > >> > >> Hello, > >> > >> I am a medical student who happened to come across cTAKES for a project > I > >> am working on. What I would like to do is take a list of surgeries in a > >> text file and have cTAKES output what it determines to be the best UMLS > >> code (CUI) for that particular line. > >> > >> Each line of the text file is independent of the others (i.e., each line > >> should be read and analyzed separately). For example, here's my list of > >> the > >> surgeries (Surgery_List.txt): > >> Colonoscopy with Polypectomy > >> Esophagogastroduodenoscopy Colonoscopy > >> Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle > >> aspiration > >> > >> When I run the piper file (see below), I get the following output: > >> Colonoscopy with Polypectomy > >> "Colonoscopy" > >> Procedure > >> C0009378 colonoscopy > >> "Polypectomy" > >> Procedure > >> C0521210 Resection of polyp > >> > >> Esophagogastroduodenoscopy Colonoscopy > >> "Esophagogastroduodenoscopy" > >> Procedure > >> C0079304 Esophagogastroduodenoscopy > >> "Colonoscopy" > >> Procedure > >> C0009378 colonoscopy > >> > >> Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle > >> aspiration > >> "Esophagogastroduodenoscopy" > >> Procedure > >> C0079304 Esophagogastroduodenoscopy > >> "Endoscopic ultrasound" > >> Procedure > >> C0376443 Endoscopic Ultrasound > >> "Endoscopic" > >> Procedure > >> C0014245 Endoscopy (procedure) > >> "ultrasound" > >> Procedure > >> C0041618 Ultrasonography > >> "Fine needle aspiration" > >> Procedure > >> C1510483 Fine needle aspiration biopsy > >> "aspiration" > >> Procedure > >> C0349707 Aspiration-action > >> > >> Here's the piper file I have been using: > >> reader org.apache.ctakes.core.cr.FileTreeReader > >> InputDirectory="C:\path\to\input\folder" > >> load DefaultTokenizerPipeline.piper > >> > >> > SentenceModelFile=C:\cTAKES_4.0.0\desc\ctakes-core\desc\analysis_engine\SentenceDetectorAnnotatorBIO.xml > >> add ContextDependentTokenizerAnnotator > >> add org.apache.ctakes.necontexts.ContextAnnotator > >> addDescription POSTagger > >> load ChunkerSubPipe.piper > >> set ctakes.umlsuser=my_username ctakes.umlspw=my_password > >> add org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator > >> > >> > DictionaryDescriptor=C:\cTAKES_4.0.0\desc\ctakes-dictionary-lookup-fast\desc\analysis_engine\UmlsLookupAnnotator.xml > >> > >> > LookupXml=C:\cTAKES_4.0.0\resources\org\apache\ctakes\dictionary\lookup\fast\sno_rx_16ab.xml > >> add property.plaintext.PropertyTextWriterFit > >> OutputDirectory="C:\path\to\output\folder" > >> > >> The workaround I have developed is as follows... > >> 1.) Save each line of Surgery_List.txt to separate text files > >> 2.) Use a Python script to parse each individual text file to extract > the > >> first UMLS code (CUI) given in the text file > >> > >> The above method works fine when there's only 10 lines, but not so well > >> when there's 40,000 lines in Surgery_List.txt. > >> > >> Ideally, I would like for Fast Dictionary Lookup to just return the top > >> result for each line of Surgery_List.txt. For example, Output.txt would > >> look just like this: > >> C0009378 > >> C0079304 > >> C0079304 > >> > >> Just for reference here's how UMLS codes correspond between > >> Surgery_List.txt and Output.txt: > >> C0009378 --> Colonoscopy with Polypectomy > >> C0079304 --> Esophagogastroduodenoscopy Colonoscopy > >> C0079304 --> Esophagogastroduodenoscopy with Endoscopic ultrasound Fine > >> needle aspiration > >> > >> Is there something I can add to the piper file to make this happen? > >> > >> Currently, I have the cTAKES user version installed, but I could install > >> the developer version if need be. I would just need a little guidance on > >> which Java script I would need to modify to get the desired results. > >> > >> Thank You, > >> > >> Ryan Young > >> MD/MBA Candidate > >> Jacobs School of Medicine & Biomedical Sciences > >> > > >