Ok, let me see if I understand your current setup: Ctakes 4.0 fast lookup, Dictionary configuration file points to an sql server, Sql server uses cui_terms (cui, rword, rindex, tcount, text) and perhaps other secondary tables ...
Now that I write out the column names, I have a thought. Is it possible that for some term the number in tcount does not match the number of non-whitespace 'words' in the text column? If those numbers are off then you will have problems similar to the one that you are seeing. If you are populating your own table you need to make sure that the text is being properly tokenized. For instance, the term "alpha-beta" should have text "alpha - beta" with tcount 3. There are some exceptions to the dash -separation rule and a few oddities. Sean -----Original Message----- From: Jeff Headley [mailto:jeffun...@gmail.com] Sent: Tuesday, October 03, 2017 8:52 AM To: dev@ctakes.apache.org Subject: Re: NPE after upgrade in DefaultJCASTermAnnotator [EXTERNAL] I updated our pom to use the same hsqldb version as what I saw in the ctakes lib folder. The data coming in is from a SQL Server database. On Tue, Oct 3, 2017 at 8:45 AM, Finan, Sean < sean.fi...@childrens.harvard.edu> wrote: > Hi Jeff, > > I don't think that a custom dictionary should cause a null pointer > exception on that line unless you have an odd null character in text > or something of that ilk. > > One thing that changed in ctakes 4.0 is the version of hsqldb that is > being used for the dictionary database. I don’t know if that has > anything to do with your problem, but it may be causing others. > What is the source of your custom dictionary? There may be a better > way to populate a database. > > Sean > > -----Original Message----- > From: Jeff Headley [mailto:jeffun...@gmail.com] > Sent: Tuesday, October 03, 2017 12:53 AM > To: dev@ctakes.apache.org > Subject: Re: NPE after upgrade in DefaultJCASTermAnnotator [EXTERNAL] > > Thank you Sean. That helped to figure out what we did. Not quite sure > where we went wrong but now at least we know the cause. So a long time > ago in our project using ctakes, we emptied out the tables CUI_TERMS, > RXNORM, PREFTERM, and TUI and then loaded them with the values we > wanted. Worked great. Now in the new version the > /desc/ctakes-clinical- > pipeline/desc/analysis_engine/AggregatePlaintextFastUMLSProcessor.xml > engine seems to be > using /resources/org/apache/ctakes/dictionary/lookup/fast/sno_rx_ > 16ab/sno_rx_16ab > and that seems to be where things went sideways. If I don't mess with > the db and keep the original, no errors. > > So somewhere in this if statement at line 102 in DefaultJCASTermAnnotator: > if ( hitTokens[ hit ].equals( allTokens.get( i ).getText() ) > || hitTokens[ hit ].equals( allTokens.get( i > ).getVariant() ) > ) { > > It's expecting to not ever have a null and I suspect we are leaving > something null somewhere that really shouldn't have nulls. If it's > obvioius where I've went wrong, the assistance would be appreciated. > Otherwise, I'll get it figured out eventually. I suspect it's possibly > because we never did anything with the SNOMEDCT_US in the prior version. > > On Mon, Oct 2, 2017 at 10:47 AM, Finan, Sean < > sean.fi...@childrens.harvard.edu> wrote: > > > Hi Jeff, > > > > I have no problem running on your example "DIDANOSINE, 250MG (PO > > Capsule Delayed Release)" or any other text. > > > > I don't know how you are running ctakes through > com.clientproject.ctakes. > > processors.CommandLineProcessor, so I don't know how closely the > > standard pipeline approximates yours. > > > > Sean > > > > -----Original Message----- > > From: Jeff Headley [mailto:jeffun...@gmail.com] > > Sent: Sunday, October 01, 2017 11:31 PM > > To: dev@ctakes.apache.org > > Subject: NPE after upgrade in DefaultJCASTermAnnotator [EXTERNAL] > > > > After upgrading our project to version 4, we are getting a NPE from > cTAKES. > > The text that was being processed was DIDANOSINE, 250MG (PO Capsule > > Delayed Release), though it seems to be happening to us no matter > > what text we submit. The stack trace is below. Any help would be > > appreciated as I'm at a loss at to what we might be doing wrong if > > this > is not a bug in cTAKES. > > > > Thank you, > > Jeff > > > > Oct 01, 2017 11:10:16 PM > > org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl > > processAndOutputNewCASes(273) > > SEVERE: Exception occurred > > org.apache.uima.analysis_engine.AnalysisEngineProcessException: > > Annotator processing failed. > > at > > org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl. > > callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:412) > > at > > org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl. > > processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:314) > > at > > org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator. > > processUntilNextOutputCas(ASB_impl.java:570) > > at > > org.apache.uima.analysis_engine.asb.impl.ASB_impl$ > > AggregateCasIterator.<init>(ASB_impl.java:412) > > at > > org.apache.uima.analysis_engine.asb.impl.ASB_impl. > > process(ASB_impl.java:344) > > at > > org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl. > > processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:265) > > at > > org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process( > > AnalysisEngineImplBase.java:269) > > at > > org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process( > > AnalysisEngineImplBase.java:284) > > at > > com.clientproject.ctakes.processors.CommandLineProcessor.processLine > > ( > > CommandLineProcessor.java:163) > > at > > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList. > > java:1374) > > at > > java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline. > > java:580) > > at > > com.clientproject.ctakes.processors.CommandLineProcessor.run( > > CommandLineProcessor.java:114) > > at com.clientproject.ctakes.App.main(App.java:109) > > Caused by: java.lang.NullPointerException at > > org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator. > > isTermMatch(DefaultJCasTermAnnotator.java:102) > > at > > org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator. > > findTerms(DefaultJCasTermAnnotator.java:79) > > at > > org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator. > > findTerms(AbstractJCasTermAnnotator.java:236) > > at > > org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator. > > processWindow(AbstractJCasTermAnnotator.java:219) > > at > > org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.pr > > oc > > ess( > > AbstractJCasTermAnnotator.java:156) > > at > > org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process( > > JCasAnnotator_ImplBase.java:48) > > at > > org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl. > > callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:396) > > ... 12 more > > >