Blacklist format Actually I got it inverted, its: semantic_code1, semantic_code2,...|text1 semantic_code1, semantic_code2,...|text2
Peter On Tue, Aug 4, 2020 at 4:16 PM Peter Abramowitsch <pabramowit...@gmail.com> wrote: > Ok Thanks Jeff. I'm glad I wasn't missing something important. > > There already is a blacklist text mechanism which suppresses > identification of specific text by clinical domain. > Looking at the code it collects entries like > cTakesSemanticCode,texta,textb,textc > NE_TYPE_ID_DRUG, jasmine, coriander, bleach > There's a case sensitive list and a case insensitive one. > > So I will try that. > in one of my examples, I'll say that 'bed' is not a disorder, while 'BED' > could be one. > > > > On Tue, Aug 4, 2020 at 2:12 PM Jeffrey Miller <jeff...@gmail.com> wrote: > >> Hi Peter, >> >> To your question about sno_rx_16ab I suspect that the CUI is new since >> 2016, or if it existed in UMLS back then, it was not associated with a >> term >> in snomed or rxnorm at that time. >> >> To those solutions, if you are able to use the trunk I know Sean said >> there >> was a suppression text feature, otherwise in the past I have removed the >> lines from the .script file >> >> I definitely think the acronym case sensitive feature would be great. >> >> Jeff >> >> On Tue, Aug 4, 2020 at 3:28 PM Peter Abramowitsch < >> pabramowit...@gmail.com> >> wrote: >> >> > Hi Jeff et al >> > >> > To take up the thread from a few days ago where a simple english word >> such >> > as bed, soft, shop also maps into a legitimate but rarely used acronym >> and >> > shows up in the same POS as a potentially interesting entity, what is >> the >> > mechanism you would use to disambiguate? >> > >> > This problem only started since I constructed a SNO+RX+HGNC dictionary >> > from the 2020A UMLS dump. Adding more TUIS where a more conventional >> > word-sense of the target word occurs, does not fix this problem. >> > >> > For instance, why does the sno_rx dictionary not contain this disease >> which >> > aliases to "bed" ? >> > >> > ucsf_dict_v1 $ grep 3159311 *.script >> > *INSERT INTO CUI_TERMS VALUES(3159311,0,1,'bed','bed')* >> > INSERT INTO CUI_TERMS VALUES(3159311,5,8,'myopia , high , with >> > nonprogressive cone dysfunction','nonprogressive') >> > INSERT INTO CUI_TERMS VALUES(3159311,0,3,'bornholm eye >> disease','bornholm') >> > INSERT INTO CUI_TERMS VALUES(3159311,5,6,'x-linked cone dysfunction >> > syndrome with myopia','myopia') >> > INSERT INTO TUI VALUES(3159311,47) >> > *INSERT INTO PREFTERM VALUES(3159311,'BORNHOLM EYE DISEASE')* >> > INSERT INTO SNOMEDCT_US VALUES(3159311,718718009) >> > >> > >> > sno_rx_16ab $ grep 3159311 *.script >> > nada >> > >> > Solutions good or evil? >> > >> > - Strip the relevant lines out of ths dict.script file? >> > - Blacklist the text? >> > - Add to my stopCUI list (a little feature I added)? >> > - Some other configuration I don't know about? >> > For instance, is there a CUI:ACRONYM table? >> > I'm tempted to create one. This would require the matching term to >> be >> > present in upper case. >> > >> > Peter >> > >> >