RE: The SegmentRegexAnnotator of Ytex

2015-07-14 Thread Oranit Dror
Thank you, Vijay.
However, I am still encountering with the crash.

Best,
Oranit.

-Original Message-
From: vijay garla [mailto:vnga...@gmail.com] 
Sent: Monday, July 13, 2015 5:53 PM
To: dev@ctakes.apache.org
Subject: Re: The SegmentRegexAnnotator of Ytex

see https://cwiki.apache.org/confluence/display/CTAKES/User%27s+Guide

best,

vj

On Mon, Jul 13, 2015 at 2:50 AM, Oranit Dror  wrote:

> Hello,
>
> I am using ctakes 3.2.2. and recently I have tried to apply the YTEX
> pipeline. Particularly, I am interested in the SegmentRegexAnnotator of
> Ytex.
>
> My questions are:
>
> 1.   When running the pipeline, an
> org.apache.uima.resource.ResourceInitializationException is thrown,
> probably due to a failure in the initialization of
> org.apache.ctakes.ytex.uima.annotators.SegmentRegexAnnotator. Below is the
> stack trace.
>
> 2.   Where can I find information on how the SegmentRegexAnnotator
> works, especially where the list of segments is defined.
>
> Thank you,
> Oranit.
>
>
> The stack trace for the Ytex pipeline crash:
>
> 12 יול 2015 09:47:52 ERROR RunEngine - Failed to create AE from xml
> descriptor
> :E:/Data/Views/oranit_nlp/subprod1/nlp/java/algotec-nlp/desc/desc/algotec-nlp/desc/analysis_engine/AggregateDiseaseYtexUMLSProcessorDescriptor.xml
> org.apache.uima.resource.ResourceInitializationException: Initialization
> of annotator class
> "org.apache.ctakes.ytex.uima.annotators.SegmentRegexAnnotator" failed.
> (Descriptor: file:/E:/Program
> Files/apache-ctakes-3.2.2-rc2/desc/ctakes-ytex-uima/desc/analysis_engine/SegmentRegexAnnotator.xml)
>at
> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:252)
>at
> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initialize(PrimitiveAnalysisEngine_impl.java:156)
>at
> org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94)
>at
> org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)
>at
> org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:269)
>at
> org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:387)
>at
> org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.java:254)
>at
> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initASB(AggregateAnalysisEngine_impl.java:431)
>at
> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:375)
>at
> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initialize(AggregateAnalysisEngine_impl.java:185)
>at
> org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94)
>at
> org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)
>at
> org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:269)
>at
> org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:387)
>at
> org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.java:254)
>at
> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initASB(AggregateAnalysisEngine_impl.java:431)
>at
> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:375)
>at
> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initialize(AggregateAnalysisEngine_impl.java:185)
>at
> org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94)
>at
> org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)
>at
> org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:269)
>at
> org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:354)
>at com.algotec.nlp.RunEngine.createCasObjects(RunEngine.java:1399)
>at com.algotec.nlp.RunEngine.ensureCasObjects(RunEngine.java:1373)
>at com.algotec.nlp.RunEngine.analyze(RunEngine.java:954)
>at
> com.algotec.nlp.servlet.ReportNLPServlet.doPost(ReportNLPServlet.java:128)
>at
> com.algotec.nlp.servlet.ReportNLPServlet.doPost(ReportNLPServlet.java:103)
>at javax.servlet.http.HttpServlet.service(HttpServlet.java:647)
>at javax.servlet.http.HttpServlet.service(HttpServlet.java:728)
>at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305)
>at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
>at
> org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:51)
>at
> org.apache.catalina.core.ApplicationFilterChain.in

periods and the interaction with PTB & Fast Dict Lookup.

2015-07-14 Thread britt fitch
Another question/topic likely for Sean & Tim. Happy to get others’ feedback as 
well.

I am trying to identify gene related information.

It appears that the PTB tokenization logic in places like the tokenizer & 
dictionary building will split a string into multiple tokens if it is not a 
number and contains a period.

For example, given “22q11.2 deletion syndrome”:

PTB tokenizer: [22q11, .2, deletion, syndrome]
POS for the above term: [CD, CD, NN, NN]
Chunks for the above term: [B-NP, I-NP, I-NP, I-NP]

The same string creates a different split of [22q11, ., 2, deletion, syndrome] 
in the new dictionary module (RareWordTermMapCreator.getTokens)
When the _rareWordTermMap gets created it uses the first token as the key: 
22q11=[org.apache.ctakes.dictionary.lookup2.term.RareWordTerm@37917c4d]

The period-split difference above (period alone vs period + number) might be 
irrelevant here because for the input “22q11.2 deletion syndrome”, the lookup 
indices are [2,3].
The new lookup will ignore incoming tokens “22q11” because its CD and “.2” 
because its a number.

It looks like this concept might not be possible to be identified unless CD is 
allowed as a lookup token POS.
Even if this is allowed though, in the case of gene locations I think the PTB 
rules might not be the best fit.

Are there any thoughts/experiences regarding addressing the gene location 
mentions like this?
Should the Fast Dict tokenization logic match the PTB tokenizer logic to 
produce the same components?

Let me know if I read into one of these points wrong. Since these items would 
likely cause large changes I am looking to get some feedback before moving 
forward.

Cheers,

Britt


Britt Fitch
Wired Informatics
265 Franklin St Ste 1702
Boston, MA 02110
http://wiredinformatics.com
britt.fi...@wiredinformatics.com



signature.asc
Description: Message signed with OpenPGP using GPGMail


Developer install guide needs some changes

2015-07-14 Thread John Mongan
Hello all —

I’ve been doing a fresh developer install and noted a couple things in the 
developer install guide that could perhaps use some updating.

In particular:

The subversive plugin for maven (m2e) doesn’t seem to work or be maintained any 
more. Subclipse seems to be a functional replacement. You can install it from 
within Eclipse (Help | Install Software…) by adding the following sites:

http://subclipse.tigris.org/update_1.10.x 

install everything for subclipse, then add

http://subclipse.tigris.org/m2eclipse/latest/

and install everything for m2e integration.

It doesn’t seem to be necessary to download the Dictionaries and models 
separately, they seem to be automatically downloaded by maven (in other words, 
skip steps 3-6 under Compile a release in Eclipse).

If this information could be included in the developer install guide somewhere 
it may help to save others some time.

Thanks,

John




RE: Developer install guide needs some changes

2015-07-14 Thread Chen, Pei
John,
Would you mind updating the doc directly?  If you send over your confluence 
wiki id (create an account?) and we can grant the necessary karma.

> It doesn't seem to be necessary to download the Dictionaries and models 
> separately, they seem to be automatically downloaded by maven (in other 
> words, skip steps 3-6 under Compile a release in Eclipse).
Yes, you are correct.  If you're in eclipse IDE, the plugin's will 
automatically download and unpack the dictionaries for you.

-Original Message-
From: John Mongan [mailto:john.mon...@ucsf.edu] 
Sent: Tuesday, July 14, 2015 4:27 PM
To: dev@ctakes.apache.org
Subject: Developer install guide needs some changes

Hello all -

I've been doing a fresh developer install and noted a couple things in the 
developer install guide that could perhaps use some updating.

In particular:

The subversive plugin for maven (m2e) doesn't seem to work or be maintained any 
more. Subclipse seems to be a functional replacement. You can install it from 
within Eclipse (Help | Install Software...) by adding the following sites:

https://urldefense.proofpoint.com/v2/url?u=http-3A__subclipse.tigris.org_update-5F1.10.x&d=BQIF-g&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WY&m=dWC-QSjZCNbyj6WA4ZDErywyBCUzUmfeo0NzMFIYZ0k&s=mrcIbRnh3ngTylS1jZXZWMODXDHKeuKSADgGlvDf5ms&e=
  

install everything for subclipse, then add

https://urldefense.proofpoint.com/v2/url?u=http-3A__subclipse.tigris.org_m2eclipse_latest_&d=BQIF-g&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WY&m=dWC-QSjZCNbyj6WA4ZDErywyBCUzUmfeo0NzMFIYZ0k&s=Lv3anY0_mLEGEFW5mehGL2ELt56xBihEk62aTv5PJ_Y&e=
 

and install everything for m2e integration.

It doesn't seem to be necessary to download the Dictionaries and models 
separately, they seem to be automatically downloaded by maven (in other words, 
skip steps 3-6 under Compile a release in Eclipse).

If this information could be included in the developer install guide somewhere 
it may help to save others some time.

Thanks,

John




ctakes umlsuserapprover authentication error

2015-07-14 Thread Taylor, Stuart
Hello,

I have installed ctakes 3.2.2 using the instructions linked to on the download 
page, but I am currently receiving the following error when I run 
runctakesCVD.sh and try to load AggregatePlaintextFastUMLSProcessor.xml


14 Jul 2015 16:01:43 ERROR UmlsUserApprover - UMLS Account at 
https://uts.nlm.nih.gov/restful/isValidUMLSUser is not valid for user 
my_username with my_password


where I replaced my actual username with my_username, and my actual password 
with my_password. I verified the information by logging into the umls website 
by copy/pasting the username/password from the error message into the login 
fields.

When poking around to see if anyone else had this problem I noticed that it was 
an issue with version 3.2.1, but that it had been fixed in version 3.2.2.

I can load and run AggregatePlaintextUMLSProcessor.xml without any errors.

In case it is relevant my UMLS license got approved earlier today.