Hi Eugenia. I may be wrong, but that XML definition is out of date (which is why it is commented out). Through the piper mechanism you have a different choice. Here follows a bit more. I hope some of it is useful....
Highly specific identification of terms is difficult and I am working on some infrastructure to help in really capturing values - not only lab values, but it will take a long time as I'm just doing it for fun. But your problem seems more like a dictionary issue. I won't pretend to be an expert or to have tried out every possibility, but I'll give you a few tips. The important thing is to know that, for me at least, Ctakes is not a finished product but an eternal work in progress. It takes years of experimentation and configuration. First you need to understand what specific terms and contexts your physicians are using and whether the punctuation is clean enough that you can work with sentences or need to go down to the chunk level. in the UMLS Dictionary Lookup mechanism , the WindowAnnotation param is probably something you can supply in a piper file and it is the FQN of a class that extends Annotation. You could create your own Annotation & Annotator, or you could try using a Chunk annotator upstream of the UMLS lookup. The piper creator helps you do that. Then you would add the FQN of a Chunk to the window param of your UMLS lookup annotator. I used it a long time ago and from what I remember it basically tries to identify clauses within sentences. By doing this - especially with the Overlap Annotator, you'd prevent spilling the lookup across clauses within a sentence. You may want to play with the SentenceDetectorAnnotatorBIO instead of the SentenceDetector to see which gets you the most workable sentences. And you may want to look at this file EndOfSentenceScannerImpl.java Customizing the dictionary usually means adding a synonym for each wording that represents context in which your term will be found. Now in your specific example about a monocyte procedure vs a monocyte count result, these are not just distinct in SNOMED terms but also distinct CUIs. Here are the two canonical terms with their CUIS as I found them, then each has its synonyms. As you can see that these SYNONYMS are woefully insufficient and not only have the synonyms blurred the distinction you were looking for, but the SNOMED mapping overlaps the two concepts. This was probably done as an expedient, but from an informatics perspective, you are right. This is incorrect. INSERT INTO PREFTERM VALUES(750880,'Monocyte count result') (TUI 34) SYNONYMS count monocytes, SNO *365631001* INSERT INTO PREFTERM VALUES(200637,'Monocyte count procedure') (TUI 59) SYNONYM monos, monocyte count SNO 67776007, *365631001* Check out how a row like this works. INSERT INTO CUI_TERMS VALUES(CUI,INDEX,COUNT,'<context with keyword>','<keyword>') You can add these rows to match the language used by your physicians or in your forms. I had to do a fair bit of juggling to get what we needed and it's a job that's never finished. The way I save my changes is to produce sed files of deletions, changes, additions made to the standard dictionary, and archive those rather than the dictionary which is quite large I hope this helps. Peter On Fri, Dec 4, 2020 at 5:07 PM Monogyiou, Eugenia < eugenia.monogy...@nttdata.com> wrote: > Thank you all for the support! > Sean, Kean the labValueFinder works as described so thanks for pointing > that out! > > Peter, I will ask for your help with the LookupWindow if you could please > spare a bit more time... I have located the UmlsOverlapLookupAnnotator > file, thank you for that. > > I have located in the UmlsLookupAnnotator (in > ctakes-dictionary-lookup-fast) > <name>windowAnnotations</name> > <value> > <!-- LookupWindowAnnotation is supposed to be a refined > Noun Phrase --> > > > <!--<string>org.apache.ctakes.typesystem.type.textspan.LookupWindowAnnotation</string>--> > <!-- In some instances LookupWindowAnnotation is missing > tokens and Sentence can be used --> > > <string>org.apache.ctakes.typesystem.type.textspan.Sentence</string> > </value> > > I have gone through various java and typesystem files but I am not sure > where I can find all the potential options for the Lookup Window and > where/how I can set these. Also, if you could please let me know where in > the code it is possible to see what symbols are considered "end-of > sentence". I have noticed that ":" sometimes defines the end of a sentence > but I haven't located anything relevant in the code ... > > Peter says : > > > Sometimes you need to further customize your dictionary. (can you > please elaborate ?) > > Many thanks in advance, > > Kind Regards, > > Eugenia Monogyiou | NTT Data UK > Consulting & IT Solutions Ltd. 1 Royal Exchange, London EC3V 3DG > > Mob: +44 (0)7971623683 Email: eugenia.monogy...@nttdata.com > > > -----Original Message----- > From: Peter Abramowitsch <pabramowit...@gmail.com> > Sent: 03 December 2020 18:54 > To: dev@ctakes.apache.org > Subject: Re: Disambiguation --alignment with SNOMED > > I have this issue a lot. There are many moving parts. Sometimes it can > be resolved by using the widest window in the DictionaryLookup or > sometimes the TermOverlap lookup annotator. Sometimes you need to further > customize your dictionary. > > The problem arises when there isn't enough context to whittle down the > lookup to the correct SNOMED entity. Or there isn't a synonym entry in the > Dictionary that maps to the widest context in your texts. If you look at > how the UMLS SNO_RX dictionary is structured you'll see how it can happen. > > For starters, look at the raw XMI and see all the entries in the UmlsArray > that were selected even if later, only the wrong one entry surfaced. > > Another issue is the LabValueFinder. It has settings that allow it to > clone procedures into lab values or vice versa (I can't remember). This > can lead to a lot of duplication > > Peter > > On Thu, Dec 3, 2020 at 2:23 PM Monogyiou, Eugenia < > eugenia.monogy...@nttdata.com> wrote: > > > Hello, > > > > I think I have hit a wall in terms of applying disambiguation in the > > cTakes context. I have come across the following example where what I > > consider to be a lab result (Monocyte Count) is picked up as a > > procedure, apparently, in alignment with UMLS > > coding Scheme = SNOMED Code =67776007, CUI =C0200637 , TUI =T059 > > , preferredText = " Monocyte Count Procedure" > > coding Scheme = SNOMED Code =365631001, CUI =C0200637 , TUI =T059 > , > > preferredText = " Monocyte Count Procedure" > > > > While they share the CUI (at UMLS level, due to the reconciliation of > > different ontologies), they are quite different concepts. 67776007 > > stands for "Monocyte count (procedure)" while 365631001 stands for > > "Finding of monocyte count (finding)". So is it fair to say that > > cTakes is not fully aligned with SNOMED? Is there a rule on how such > > concepts may be merged under the same CUI? Would using YTEX resolve > similar issues? > > > > And also I'm using cTakes 4.0.0 and the YTEX installation guide > > appears to be outdated - the patch download is missing , names of files > missing etc. > > If YTEX is the answer are there any updated instructions? If it is not > > are you using other UIMA-friendly solutions? > > > > Many thanks in advance, > > Eugenia > > > > Disclaimer: This email and any attachments are sent in strictest > > confidence for the sole use of the addressee and may contain legally > > privileged, confidential, and proprietary data. If you are not the > > intended recipient, please advise the sender by replying promptly to > > this email and then delete and destroy this email and any attachments > > without any further use, copying or forwarding. > > > Disclaimer: This email and any attachments are sent in strictest > confidence for the sole use of the addressee and may contain legally > privileged, confidential, and proprietary data. If you are not the intended > recipient, please advise the sender by replying promptly to this email and > then delete and destroy this email and any attachments without any further > use, copying or forwarding. >