Regarding the miss on “cm” in #2, you might want to check out the dictionary xml descriptor or uimafit wiring, depending on which you are using, for the parameter “minimumSpan”. If I recall correctly the default minimum span is 3 characters, however you can reduce it to 2 if desired.
Cheers, Britt Britt Fitch Wired Informatics 265 Franklin St Ste 1702 Boston, MA 02110 http://wiredinformatics.com britt.fi...@wiredinformatics.com > On Jun 21, 2015, at 2:45 PM, Miller, Timothy > <timothy.mil...@childrens.harvard.edu> wrote: > > Sean wrote the fast version and may be able to answer your specific > questions. But in general, the fast dictionary does not match performance > exactly -- it is not implementing an equivalent search and it has different > indexing methods. We are happy to receive reports of what seem like bugs, > though, any new software is likely to have some. What I will say is that I > know Sean has run some (as yet unpublished) experiments and we believe that > in the aggregate the new system output is at least as high quality as the > older one. > Tim > > > ________________________________________ > From: Oranit Dror [ora...@algotec.co.il] > Sent: Sunday, June 21, 2015 4:37 AM > To: dev@ctakes.apache.org > Subject: The fast dictionary pipeline vs. the regular one > > Hello, > > I am using ctakes 3.2.2 with the regular pipeline. Recently, I have tested > the fast dictionary pipeline and indeed it is much faster. > However, I have encountered with several quality differences in the returned > annotations. For example: > > > 1. With the fast pipeline, the term "GBM" is annotated as "glioblastoma > multiforme", while in the regular pipeline it is annotated as "glioblastoma". > Note that according to the UMLS DB, the concept of "GBM" is "glioblastoma" > and "glioblastoma multiforme" is mapped to a narrower concept. > > > 2. The word "cm" in a phrase like "5.5 cm X 2.6 cm" is annotated by the > regular pipeline as "Cutaneous Mastocytosis", while in the fast pipeline it > is not annotated as a medical term (as expected and as in UMLS). > > > Any explanation for the differences? > > Thank you, > Oranit. > > >
signature.asc
Description: Message signed with OpenPGP using GPGMail