RE: The SegmentRegexAnnotator of Ytex
Thank you, Vijay. However, I am still encountering with the crash. Best, Oranit. -Original Message- From: vijay garla [mailto:vnga...@gmail.com] Sent: Monday, July 13, 2015 5:53 PM To: dev@ctakes.apache.org Subject: Re: The SegmentRegexAnnotator of Ytex see https://cwiki.apache.org/confluence/display/CTAKES/User%27s+Guide best, vj On Mon, Jul 13, 2015 at 2:50 AM, Oranit Dror wrote: > Hello, > > I am using ctakes 3.2.2. and recently I have tried to apply the YTEX > pipeline. Particularly, I am interested in the SegmentRegexAnnotator of > Ytex. > > My questions are: > > 1. When running the pipeline, an > org.apache.uima.resource.ResourceInitializationException is thrown, > probably due to a failure in the initialization of > org.apache.ctakes.ytex.uima.annotators.SegmentRegexAnnotator. Below is the > stack trace. > > 2. Where can I find information on how the SegmentRegexAnnotator > works, especially where the list of segments is defined. > > Thank you, > Oranit. > > > The stack trace for the Ytex pipeline crash: > > 12 יול 2015 09:47:52 ERROR RunEngine - Failed to create AE from xml > descriptor > :E:/Data/Views/oranit_nlp/subprod1/nlp/java/algotec-nlp/desc/desc/algotec-nlp/desc/analysis_engine/AggregateDiseaseYtexUMLSProcessorDescriptor.xml > org.apache.uima.resource.ResourceInitializationException: Initialization > of annotator class > "org.apache.ctakes.ytex.uima.annotators.SegmentRegexAnnotator" failed. > (Descriptor: file:/E:/Program > Files/apache-ctakes-3.2.2-rc2/desc/ctakes-ytex-uima/desc/analysis_engine/SegmentRegexAnnotator.xml) >at > org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:252) >at > org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initialize(PrimitiveAnalysisEngine_impl.java:156) >at > org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94) >at > org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62) >at > org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:269) >at > org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:387) >at > org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.java:254) >at > org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initASB(AggregateAnalysisEngine_impl.java:431) >at > org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:375) >at > org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initialize(AggregateAnalysisEngine_impl.java:185) >at > org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94) >at > org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62) >at > org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:269) >at > org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:387) >at > org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.java:254) >at > org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initASB(AggregateAnalysisEngine_impl.java:431) >at > org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:375) >at > org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initialize(AggregateAnalysisEngine_impl.java:185) >at > org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94) >at > org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62) >at > org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:269) >at > org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:354) >at com.algotec.nlp.RunEngine.createCasObjects(RunEngine.java:1399) >at com.algotec.nlp.RunEngine.ensureCasObjects(RunEngine.java:1373) >at com.algotec.nlp.RunEngine.analyze(RunEngine.java:954) >at > com.algotec.nlp.servlet.ReportNLPServlet.doPost(ReportNLPServlet.java:128) >at > com.algotec.nlp.servlet.ReportNLPServlet.doPost(ReportNLPServlet.java:103) >at javax.servlet.http.HttpServlet.service(HttpServlet.java:647) >at javax.servlet.http.HttpServlet.service(HttpServlet.java:728) >at > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305) >at > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) >at > org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:51) >at > org.apache.catalina.core.ApplicationFilterChain.in
periods and the interaction with PTB & Fast Dict Lookup.
Another question/topic likely for Sean & Tim. Happy to get others’ feedback as well. I am trying to identify gene related information. It appears that the PTB tokenization logic in places like the tokenizer & dictionary building will split a string into multiple tokens if it is not a number and contains a period. For example, given “22q11.2 deletion syndrome”: PTB tokenizer: [22q11, .2, deletion, syndrome] POS for the above term: [CD, CD, NN, NN] Chunks for the above term: [B-NP, I-NP, I-NP, I-NP] The same string creates a different split of [22q11, ., 2, deletion, syndrome] in the new dictionary module (RareWordTermMapCreator.getTokens) When the _rareWordTermMap gets created it uses the first token as the key: 22q11=[org.apache.ctakes.dictionary.lookup2.term.RareWordTerm@37917c4d] The period-split difference above (period alone vs period + number) might be irrelevant here because for the input “22q11.2 deletion syndrome”, the lookup indices are [2,3]. The new lookup will ignore incoming tokens “22q11” because its CD and “.2” because its a number. It looks like this concept might not be possible to be identified unless CD is allowed as a lookup token POS. Even if this is allowed though, in the case of gene locations I think the PTB rules might not be the best fit. Are there any thoughts/experiences regarding addressing the gene location mentions like this? Should the Fast Dict tokenization logic match the PTB tokenizer logic to produce the same components? Let me know if I read into one of these points wrong. Since these items would likely cause large changes I am looking to get some feedback before moving forward. Cheers, Britt Britt Fitch Wired Informatics 265 Franklin St Ste 1702 Boston, MA 02110 http://wiredinformatics.com britt.fi...@wiredinformatics.com signature.asc Description: Message signed with OpenPGP using GPGMail
Developer install guide needs some changes
Hello all — I’ve been doing a fresh developer install and noted a couple things in the developer install guide that could perhaps use some updating. In particular: The subversive plugin for maven (m2e) doesn’t seem to work or be maintained any more. Subclipse seems to be a functional replacement. You can install it from within Eclipse (Help | Install Software…) by adding the following sites: http://subclipse.tigris.org/update_1.10.x install everything for subclipse, then add http://subclipse.tigris.org/m2eclipse/latest/ and install everything for m2e integration. It doesn’t seem to be necessary to download the Dictionaries and models separately, they seem to be automatically downloaded by maven (in other words, skip steps 3-6 under Compile a release in Eclipse). If this information could be included in the developer install guide somewhere it may help to save others some time. Thanks, John
RE: Developer install guide needs some changes
John, Would you mind updating the doc directly? If you send over your confluence wiki id (create an account?) and we can grant the necessary karma. > It doesn't seem to be necessary to download the Dictionaries and models > separately, they seem to be automatically downloaded by maven (in other > words, skip steps 3-6 under Compile a release in Eclipse). Yes, you are correct. If you're in eclipse IDE, the plugin's will automatically download and unpack the dictionaries for you. -Original Message- From: John Mongan [mailto:john.mon...@ucsf.edu] Sent: Tuesday, July 14, 2015 4:27 PM To: dev@ctakes.apache.org Subject: Developer install guide needs some changes Hello all - I've been doing a fresh developer install and noted a couple things in the developer install guide that could perhaps use some updating. In particular: The subversive plugin for maven (m2e) doesn't seem to work or be maintained any more. Subclipse seems to be a functional replacement. You can install it from within Eclipse (Help | Install Software...) by adding the following sites: https://urldefense.proofpoint.com/v2/url?u=http-3A__subclipse.tigris.org_update-5F1.10.x&d=BQIF-g&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WY&m=dWC-QSjZCNbyj6WA4ZDErywyBCUzUmfeo0NzMFIYZ0k&s=mrcIbRnh3ngTylS1jZXZWMODXDHKeuKSADgGlvDf5ms&e= install everything for subclipse, then add https://urldefense.proofpoint.com/v2/url?u=http-3A__subclipse.tigris.org_m2eclipse_latest_&d=BQIF-g&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WY&m=dWC-QSjZCNbyj6WA4ZDErywyBCUzUmfeo0NzMFIYZ0k&s=Lv3anY0_mLEGEFW5mehGL2ELt56xBihEk62aTv5PJ_Y&e= and install everything for m2e integration. It doesn't seem to be necessary to download the Dictionaries and models separately, they seem to be automatically downloaded by maven (in other words, skip steps 3-6 under Compile a release in Eclipse). If this information could be included in the developer install guide somewhere it may help to save others some time. Thanks, John
ctakes umlsuserapprover authentication error
Hello, I have installed ctakes 3.2.2 using the instructions linked to on the download page, but I am currently receiving the following error when I run runctakesCVD.sh and try to load AggregatePlaintextFastUMLSProcessor.xml 14 Jul 2015 16:01:43 ERROR UmlsUserApprover - UMLS Account at https://uts.nlm.nih.gov/restful/isValidUMLSUser is not valid for user my_username with my_password where I replaced my actual username with my_username, and my actual password with my_password. I verified the information by logging into the umls website by copy/pasting the username/password from the error message into the login fields. When poking around to see if anyone else had this problem I noticed that it was an issue with version 3.2.1, but that it had been fixed in version 3.2.2. I can load and run AggregatePlaintextUMLSProcessor.xml without any errors. In case it is relevant my UMLS license got approved earlier today.