Re: The fast dictionary pipeline vs. the regular one

2015-06-22 Thread britt fitch
Regarding the miss on “cm” in #2, you might want to check out the dictionary 
xml descriptor or uimafit wiring, depending on which you are using, for the 
parameter “minimumSpan”. If I recall correctly the default minimum span is 3 
characters, however you can reduce it to 2 if desired.

Cheers,

Britt


Britt Fitch
Wired Informatics
265 Franklin St Ste 1702
Boston, MA 02110
http://wiredinformatics.com
britt.fi...@wiredinformatics.com

> On Jun 21, 2015, at 2:45 PM, Miller, Timothy 
>  wrote:
> 
> Sean wrote the fast version and may be able to answer your specific 
> questions. But in general, the fast dictionary does not match performance 
> exactly -- it is not implementing an equivalent search and it has different 
> indexing methods. We are happy to receive reports of what seem like bugs, 
> though, any new software is likely to have some. What I will say is that I 
> know Sean has run some (as yet unpublished) experiments and we believe that 
> in the aggregate the new system output is at least as high quality as the 
> older one.
> Tim
> 
> 
> 
> From: Oranit Dror [ora...@algotec.co.il]
> Sent: Sunday, June 21, 2015 4:37 AM
> To: dev@ctakes.apache.org
> Subject: The fast dictionary pipeline vs. the regular one
> 
> Hello,
> 
> I am using ctakes 3.2.2 with the regular pipeline. Recently, I have tested 
> the fast dictionary pipeline and indeed it is much faster.
> However, I have encountered with several quality differences in the returned 
> annotations. For example:
> 
> 
> 1.   With the fast pipeline, the term "GBM" is annotated as "glioblastoma 
> multiforme", while in the regular pipeline it is annotated as "glioblastoma".
> Note that according to the UMLS DB, the concept of "GBM" is "glioblastoma" 
> and "glioblastoma multiforme" is mapped to a narrower concept.
> 
> 
> 2.   The word "cm" in a phrase like "5.5 cm X 2.6 cm" is annotated by the 
> regular pipeline as "Cutaneous Mastocytosis", while in the fast pipeline it 
> is  not annotated as a medical term (as expected and as in UMLS).
> 
> 
> Any explanation for the differences?
> 
> Thank you,
> Oranit.
> 
> 
> 



signature.asc
Description: Message signed with OpenPGP using GPGMail


RE: The fast dictionary pipeline vs. the regular one

2015-06-22 Thread Finan, Sean
Hi all,

I’m glad that there continues to be interest in the fast alternative to the 
dictionary lookup and I welcome all testing.

GBM actually is Glioblastoma Multiforme – hence the “M”.   The WHO name is the 
abbreviated “Glioblastoma”, but they are actually not (as far as I can discern) 
different things.  If you check the metathesaurus 2011ab, GBM brings up both 
Glioblastoma C0017636 and Glioblastoma Multiforme C1621958.  The first comes 
from Mesh and NCI, the second from CSP.  If you look at the definitions they 
are synonymous: “malignant form of astrocytoma histologically characterized by 
pleomorphism of cells, nuclear atypia, microhemorrhage and necrosis; may arise 
in any region of the central nervous system, with a predilection for the 
cerebral hemispheres, basal ganglia, and commissural pathways.”  Mapping to a 
different CUI in the UMLS does not always mean that they are truly different 
concepts.  It often means that they came from 2 different source dictionaries 
(such as in this case).  Also check 
https://en.wikipedia.org/wiki/Glioblastoma_multiforme  But I am a little 
confused: are you saying that you got only Glioblastoma Multiforme C1621958 and 
not Glioblastoma C0017636 ?  When I run it I get both returns …

Britt is correct (thank you) in that if you change the default minimum span 
from 3 to 2 you will get Cutaneous Mastocytosis C1136033 within “5.5 cm”.  The 
minimum span is 3 (not 2) to prevent things like the obviously garbage return 
of Cutaneous Mastocytosis for every “cm”.  However, feel free to change it to 
fit your purposes.  2 characters is the minimum – you cannot lookup 1 character 
terms with the default dictionary.  You can do so with a custom dictionary if 
you like – which might be useful if you just have 1 or 2 single-character terms.

Sean

From: britt fitch [mailto:britt.fi...@wiredinformatics.com]
Sent: Monday, June 22, 2015 9:24 AM
To: dev@ctakes.apache.org
Subject: Re: The fast dictionary pipeline vs. the regular one

Regarding the miss on “cm” in #2, you might want to check out the dictionary 
xml descriptor or uimafit wiring, depending on which you are using, for the 
parameter “minimumSpan”. If I recall correctly the default minimum span is 3 
characters, however you can reduce it to 2 if desired.

Cheers,

Britt









Britt Fitch
Wired Informatics
265 Franklin St Ste 1702
Boston, MA 02110
http://wiredinformatics.com
britt.fi...@wiredinformatics.com

On Jun 21, 2015, at 2:45 PM, Miller, Timothy 
mailto:timothy.mil...@childrens.harvard.edu>>
 wrote:

Sean wrote the fast version and may be able to answer your specific questions. 
But in general, the fast dictionary does not match performance exactly -- it is 
not implementing an equivalent search and it has different indexing methods. We 
are happy to receive reports of what seem like bugs, though, any new software 
is likely to have some. What I will say is that I know Sean has run some (as 
yet unpublished) experiments and we believe that in the aggregate the new 
system output is at least as high quality as the older one.
Tim



From: Oranit Dror [ora...@algotec.co.il]
Sent: Sunday, June 21, 2015 4:37 AM
To: dev@ctakes.apache.org
Subject: The fast dictionary pipeline vs. the regular one

Hello,

I am using ctakes 3.2.2 with the regular pipeline. Recently, I have tested the 
fast dictionary pipeline and indeed it is much faster.
However, I have encountered with several quality differences in the returned 
annotations. For example:


1.   With the fast pipeline, the term "GBM" is annotated as "glioblastoma 
multiforme", while in the regular pipeline it is annotated as "glioblastoma".
Note that according to the UMLS DB, the concept of "GBM" is "glioblastoma" and 
"glioblastoma multiforme" is mapped to a narrower concept.


2.   The word "cm" in a phrase like "5.5 cm X 2.6 cm" is annotated by the 
regular pipeline as "Cutaneous Mastocytosis", while in the fast pipeline it is  
not annotated as a medical term (as expected and as in UMLS).


Any explanation for the differences?

Thank you,
Oranit.





RE: Apache cTAKES hosted demos and examples

2015-06-22 Thread Finan, Sean
Very cool.  It allows quick reproducible testing.  For example:












This is extremely useful!

-Original Message-
From: Pei Chen [mailto:chen...@apache.org] 
Sent: Friday, June 19, 2015 11:33 AM
To: dev@ctakes.apache.org; u...@ctakes.apache.org
Subject: Apache cTAKES hosted demos and examples

There seems to be a significant interest in having a hosted demo and examples, 
so I started this index page along with initial code examples:

Index page:
https://urldefense.proofpoint.com/v2/url?u=http-3A__healthnlp.github.io_examples_&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=Qiv3CPnn_kSAs-guC8v9Zd6RuvSU5TftExH8bS5kN5A&s=QIjhvps1xpM3shTSmpKSWNdpp51hhrdvZTEFxtxjLE0&e=
 

Live demo:
https://urldefense.proofpoint.com/v2/url?u=http-3A__52.24.118.198-3A8080_index.jsp&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=Qiv3CPnn_kSAs-guC8v9Zd6RuvSU5TftExH8bS5kN5A&s=WccVDuNaCd-0oHrPBxznasYuMRxkZP_0HHmFlhOR0dY&e=
 

--Pei


RE: RareWord term

2015-06-22 Thread Finan, Sean
Hi Maite,
I hope to have a paper out on this soon, so I am keeping things kind of quiet 
about it - though one can always look at the database and code to get an idea 
of what it means.
For anything else in the module, you can look at the wiki page:

https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.2+-+Fast+Dictionary+Lookup

Sean

-Original Message-
From: Maite Meseure Hugues [mailto:meseure.ma...@gmail.com] 
Sent: Thursday, June 18, 2015 12:02 PM
To: dev@ctakes.apache.org
Subject: RareWord term

Hi everyone,

I am currently using UmlsJdbcRareWordDictionary and I would like to better 
understand how is chosen the rare word term. I found this comment '
Dictionary used to lookup terms by the most rare word within them' but no more 
explanation, does anyone have any pointers?
Thank you in advance.

Maite


Re: RareWord term

2015-06-22 Thread Maite Meseure Hugues
Hi Sean,

Thank you for your response, I ran into this wiki page just after sent you
this question and saw it was in progress.
Thank you.

Maite

On Mon, Jun 22, 2015 at 9:54 AM, Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:

> Hi Maite,
> I hope to have a paper out on this soon, so I am keeping things kind of
> quiet about it - though one can always look at the database and code to get
> an idea of what it means.
> For anything else in the module, you can look at the wiki page:
>
>
> https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.2+-+Fast+Dictionary+Lookup
>
> Sean
>
> -Original Message-
> From: Maite Meseure Hugues [mailto:meseure.ma...@gmail.com]
> Sent: Thursday, June 18, 2015 12:02 PM
> To: dev@ctakes.apache.org
> Subject: RareWord term
>
> Hi everyone,
>
> I am currently using UmlsJdbcRareWordDictionary and I would like to better
> understand how is chosen the rare word term. I found this comment '
> Dictionary used to lookup terms by the most rare word within them' but no
> more explanation, does anyone have any pointers?
> Thank you in advance.
>
> Maite
>


Mvn package error

2015-06-22 Thread Zhiwen Li
Hi,
I tried to compile the 3.2.3 version of Ctakes, got the following error.
Tests in error:
 
TestClearNLPPipeLine(org.apache.ctakes.dependency.parser.ae.util.TestClearNLPAnalysisEngines):
URI is not hierarchical
I realized this error was resolved before in this thread

But the same error comes up since the svg-ctakes-resources-lvg2008 was
added to the dependency in revision 1642706.
If I removed the dependency and basically restored it to revision 1620359
,
it compiles file. But I am not sure if this dependency is necessary or not.
I don't know why this specific lvg version is required after revision
1642706.
Please help to clarify.

Thanks,
Simon

-- 

Zhiwen Li

l...@udel.edu


RE: TimeLanes

2015-06-22 Thread Finan, Sean
Hi Maashu,

TimeLanes is currently a prototype gui under development and there is probably 
no information about it on the web.  It is in sandbox because it isn't part of 
the ctakes release and is missing much needed functionality.  For instance, It 
should display basic information about the patient and note (name, birth date, 
note date), but such things are often in structured data or some custom header 
of the note.  Right now TimeLanes does not fetch them at all (it will require 
custom readers) and just displays "Dan Testing".

If you want to run it, the main class is 
org.chboston.cnlp.timeline.gui.main.TimelineMain .  Upon startup it will 
display "open a note".  You can use the "Open" button or drag a file into the 
box.  Unfortunately, it does not yet run ctakes (coming soon), so you need to 
give it an annotated (protégé or Anafora) note or .xmi .  Using an .xmi would 
probably be easiest as you can create it with ctakes.  You can watch an 
outdated video here:  
https://www.youtube.com/watch?v=Kp9YE0o3urU&feature=youtu.be

Sean

-Original Message-
From: maa...@gmail.com [mailto:maa...@gmail.com] 
Sent: Friday, June 12, 2015 1:18 PM
To: dev@ctakes.apache.org
Subject: TimeLanes

Hi All,

I've just started working with cTAKES and was curious about TimeLanes.  I found 
it in the sandbox here:

https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.apache.org_repos_asf_ctakes_sandbox_timelanes_&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=qneEArWy0QvCgMGCuF8-DwG3kslsrGAKWFtmP174uO4&s=iZj-v0HJjZccezixIOmlTFwyIGFf9OqImfSv-aMKdgI&e=
 

But I'm lost on how to actually use it.  I've googled around but there seems to 
be very little information on it.

Can anyone point me in the right direction?

Thanks in advance!

Cheers,

-Maashu

--
"If you are immune to boredom, there is literally nothing you cannot 
accomplish."

-David Foster Wallace


RE: TimeLanes

2015-06-22 Thread Savova, Guergana
The cTAKES temporal component is in the main release. You can get the system 
output, but as Sean said TimeLanes does not consume it yet.

A demo of the cTAKES temporal component can be found in Getting Started -> 
Demos. Pei just put it up there, thank you very much, Pei!
--Guergana


-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] 
Sent: Monday, June 22, 2015 11:36 AM
To: dev@ctakes.apache.org
Subject: RE: TimeLanes

Hi Maashu,



TimeLanes is currently a prototype gui under development and there is probably 
no information about it on the web.  It is in sandbox because it isn't part of 
the ctakes release and is missing much needed functionality.  For instance, It 
should display basic information about the patient and note (name, birth date, 
note date), but such things are often in structured data or some custom header 
of the note.  Right now TimeLanes does not fetch them at all (it will require 
custom readers) and just displays "Dan Testing".



If you want to run it, the main class is 
org.chboston.cnlp.timeline.gui.main.TimelineMain .  Upon startup it will 
display "open a note".  You can use the "Open" button or drag a file into the 
box.  Unfortunately, it does not yet run ctakes (coming soon), so you need to 
give it an annotated (protégé or Anafora) note or .xmi .  Using an .xmi would 
probably be easiest as you can create it with ctakes.  You can watch an 
outdated video here:  

https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_watch-3Fv-3DKp9YE0o3urU-26feature-3Dyoutu.be&d=BQIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP&m=P2Q3bVKBdvXziFnahfApZEyBbj-eR-wV-TfEZfTtl0Q&s=1HETvigL__bzBXBpv2jLdRJMvJ3CI77UQZORumsBJIM&e=
 



Sean



-Original Message-

From: maa...@gmail.com [mailto:maa...@gmail.com] 

Sent: Friday, June 12, 2015 1:18 PM

To: dev@ctakes.apache.org

Subject: TimeLanes



Hi All,



I've just started working with cTAKES and was curious about TimeLanes.  I found 
it in the sandbox here:



https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.apache.org_repos_asf_ctakes_sandbox_timelanes_&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=qneEArWy0QvCgMGCuF8-DwG3kslsrGAKWFtmP174uO4&s=iZj-v0HJjZccezixIOmlTFwyIGFf9OqImfSv-aMKdgI&e=
 



But I'm lost on how to actually use it.  I've googled around but there seems to 
be very little information on it.



Can anyone point me in the right direction?



Thanks in advance!



Cheers,



-Maashu



--

"If you are immune to boredom, there is literally nothing you cannot 
accomplish."



-David Foster Wallace



RE: TimeLanes

2015-06-22 Thread Finan, Sean
Just for clarification, TimeLanes does consume ctakes output (.xmi), but it 
does not produce it.  In other words, you cannot hand it a plain text file and 
expect automatic processing.  Yet.

-Original Message-
From: Savova, Guergana [mailto:guergana.sav...@childrens.harvard.edu] 
Sent: Monday, June 22, 2015 3:02 PM
To: dev@ctakes.apache.org
Subject: RE: TimeLanes

The cTAKES temporal component is in the main release. You can get the system 
output, but as Sean said TimeLanes does not consume it yet.

A demo of the cTAKES temporal component can be found in Getting Started -> 
Demos. Pei just put it up there, thank you very much, Pei!
--Guergana


-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] 
Sent: Monday, June 22, 2015 11:36 AM
To: dev@ctakes.apache.org
Subject: RE: TimeLanes

Hi Maashu,



TimeLanes is currently a prototype gui under development and there is probably 
no information about it on the web.  It is in sandbox because it isn't part of 
the ctakes release and is missing much needed functionality.  For instance, It 
should display basic information about the patient and note (name, birth date, 
note date), but such things are often in structured data or some custom header 
of the note.  Right now TimeLanes does not fetch them at all (it will require 
custom readers) and just displays "Dan Testing".



If you want to run it, the main class is 
org.chboston.cnlp.timeline.gui.main.TimelineMain .  Upon startup it will 
display "open a note".  You can use the "Open" button or drag a file into the 
box.  Unfortunately, it does not yet run ctakes (coming soon), so you need to 
give it an annotated (protégé or Anafora) note or .xmi .  Using an .xmi would 
probably be easiest as you can create it with ctakes.  You can watch an 
outdated video here:  

https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_watch-3Fv-3DKp9YE0o3urU-26feature-3Dyoutu.be&d=BQIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP&m=P2Q3bVKBdvXziFnahfApZEyBbj-eR-wV-TfEZfTtl0Q&s=1HETvigL__bzBXBpv2jLdRJMvJ3CI77UQZORumsBJIM&e=
 



Sean



-Original Message-

From: maa...@gmail.com [mailto:maa...@gmail.com] 

Sent: Friday, June 12, 2015 1:18 PM

To: dev@ctakes.apache.org

Subject: TimeLanes



Hi All,



I've just started working with cTAKES and was curious about TimeLanes.  I found 
it in the sandbox here:



https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.apache.org_repos_asf_ctakes_sandbox_timelanes_&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=qneEArWy0QvCgMGCuF8-DwG3kslsrGAKWFtmP174uO4&s=iZj-v0HJjZccezixIOmlTFwyIGFf9OqImfSv-aMKdgI&e=
 



But I'm lost on how to actually use it.  I've googled around but there seems to 
be very little information on it.



Can anyone point me in the right direction?



Thanks in advance!



Cheers,



-Maashu



--

"If you are immune to boredom, there is literally nothing you cannot 
accomplish."



-David Foster Wallace



Re: TimeLanes

2015-06-22 Thread maa...@gmail.com
Thanks so much for your replies, Sean and Guergana!  I'm very excited for
the possibilities of this project, and hope to contribute in some small way
:)

Cheers,

-Maashu

On Mon, Jun 22, 2015 at 12:09 PM, Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:

> Just for clarification, TimeLanes does consume ctakes output (.xmi), but
> it does not produce it.  In other words, you cannot hand it a plain text
> file and expect automatic processing.  Yet.
>
> -Original Message-
> From: Savova, Guergana [mailto:guergana.sav...@childrens.harvard.edu]
> Sent: Monday, June 22, 2015 3:02 PM
> To: dev@ctakes.apache.org
> Subject: RE: TimeLanes
>
> The cTAKES temporal component is in the main release. You can get the
> system output, but as Sean said TimeLanes does not consume it yet.
>
> A demo of the cTAKES temporal component can be found in Getting Started ->
> Demos. Pei just put it up there, thank you very much, Pei!
> --Guergana
>
>
> -Original Message-
> From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
> Sent: Monday, June 22, 2015 11:36 AM
> To: dev@ctakes.apache.org
> Subject: RE: TimeLanes
>
> Hi Maashu,
>
>
>
> TimeLanes is currently a prototype gui under development and there is
> probably no information about it on the web.  It is in sandbox because it
> isn't part of the ctakes release and is missing much needed functionality.
> For instance, It should display basic information about the patient and
> note (name, birth date, note date), but such things are often in structured
> data or some custom header of the note.  Right now TimeLanes does not fetch
> them at all (it will require custom readers) and just displays "Dan
> Testing".
>
>
>
> If you want to run it, the main class is
> org.chboston.cnlp.timeline.gui.main.TimelineMain .  Upon startup it will
> display "open a note".  You can use the "Open" button or drag a file into
> the box.  Unfortunately, it does not yet run ctakes (coming soon), so you
> need to give it an annotated (protégé or Anafora) note or .xmi .  Using an
> .xmi would probably be easiest as you can create it with ctakes.  You can
> watch an outdated video here:
>
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_watch-3Fv-3DKp9YE0o3urU-26feature-3Dyoutu.be&d=BQIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP&m=P2Q3bVKBdvXziFnahfApZEyBbj-eR-wV-TfEZfTtl0Q&s=1HETvigL__bzBXBpv2jLdRJMvJ3CI77UQZORumsBJIM&e=
>
>
>
> Sean
>
>
>
> -Original Message-
>
> From: maa...@gmail.com [mailto:maa...@gmail.com]
>
> Sent: Friday, June 12, 2015 1:18 PM
>
> To: dev@ctakes.apache.org
>
> Subject: TimeLanes
>
>
>
> Hi All,
>
>
>
> I've just started working with cTAKES and was curious about TimeLanes.  I
> found it in the sandbox here:
>
>
>
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.apache.org_repos_asf_ctakes_sandbox_timelanes_&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=qneEArWy0QvCgMGCuF8-DwG3kslsrGAKWFtmP174uO4&s=iZj-v0HJjZccezixIOmlTFwyIGFf9OqImfSv-aMKdgI&e=
>
>
>
> But I'm lost on how to actually use it.  I've googled around but there
> seems to be very little information on it.
>
>
>
> Can anyone point me in the right direction?
>
>
>
> Thanks in advance!
>
>
>
> Cheers,
>
>
>
> -Maashu
>
>
>
> --
>
> "If you are immune to boredom, there is literally nothing you cannot
> accomplish."
>
>
>
> -David Foster Wallace
>
>


-- 
"If you are immune to boredom, there is literally nothing you cannot
accomplish."

-David Foster Wallace