Regarding the miss on “cm” in #2, you might want to check out the dictionary 
xml descriptor or uimafit wiring, depending on which you are using, for the 
parameter “minimumSpan”. If I recall correctly the default minimum span is 3 
characters, however you can reduce it to 2 if desired.

Cheers,

Britt


Britt Fitch
Wired Informatics
265 Franklin St Ste 1702
Boston, MA 02110
http://wiredinformatics.com
britt.fi...@wiredinformatics.com

> On Jun 21, 2015, at 2:45 PM, Miller, Timothy 
> <timothy.mil...@childrens.harvard.edu> wrote:
> 
> Sean wrote the fast version and may be able to answer your specific 
> questions. But in general, the fast dictionary does not match performance 
> exactly -- it is not implementing an equivalent search and it has different 
> indexing methods. We are happy to receive reports of what seem like bugs, 
> though, any new software is likely to have some. What I will say is that I 
> know Sean has run some (as yet unpublished) experiments and we believe that 
> in the aggregate the new system output is at least as high quality as the 
> older one.
> Tim
> 
> 
> ________________________________________
> From: Oranit Dror [ora...@algotec.co.il]
> Sent: Sunday, June 21, 2015 4:37 AM
> To: dev@ctakes.apache.org
> Subject: The fast dictionary pipeline vs. the regular one
> 
> Hello,
> 
> I am using ctakes 3.2.2 with the regular pipeline. Recently, I have tested 
> the fast dictionary pipeline and indeed it is much faster.
> However, I have encountered with several quality differences in the returned 
> annotations. For example:
> 
> 
> 1.       With the fast pipeline, the term "GBM" is annotated as "glioblastoma 
> multiforme", while in the regular pipeline it is annotated as "glioblastoma".
> Note that according to the UMLS DB, the concept of "GBM" is "glioblastoma" 
> and "glioblastoma multiforme" is mapped to a narrower concept.
> 
> 
> 2.       The word "cm" in a phrase like "5.5 cm X 2.6 cm" is annotated by the 
> regular pipeline as "Cutaneous Mastocytosis", while in the fast pipeline it 
> is  not annotated as a medical term (as expected and as in UMLS).
> 
> 
> Any explanation for the differences?
> 
> Thank you,
> Oranit.
> 
> 
> 

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to