Hi Erin, Yes, creating your customized dictionary is the way to go. You can prune by semantic types of interest and then remove branches that are not relevant to your specific phenotype. I am not aware of cTAKES implementing such a tool for a very customized dictionary.
You can also start with a few terms that you know are relevant to your phenotype and then find their synonyms in the UMLS. Then, you can further walk a specific ontology and take siblings, parents if you think they are relevant. Then, there is the whole field of using word embeddings to find synonyms/related terms from unlabeled data if you want to become really fancy :-) At this point, cTAKES does not implement any deep learning algorithms, in the future we are planning to release a bridge to KERAS. I hope this makes sense. -- Guergana Savova, PhD, FACMI Associate Professor PI Natural Language Processing Lab Boston Children's Hospital and Harvard Medical School 300 Longwood Avenue Mailstop: BCH3092 Enders 144.1 Boston, MA 02115 Tel: (617) 919-2972 Fax: (617) 730-0817 guergana.sav...@childrens.harvard.edu Harvard Scholar: http://scholar.harvard.edu/guergana_k_savova/biocv ctakes.apache.org thyme.healthnlp.org cancer.healthnlp.org share.healthnlp.org -----Original Message----- From: Erin Nicole Gustafson [mailto:erin.gustaf...@northwestern.edu] Sent: Wednesday, February 15, 2017 1:38 PM To: dev@ctakes.apache.org Subject: Phenotype-specific entities Hi all, I would like to be able to only identify entities that are relevant for some specific phenotype. One step towards achieving this would be to build a custom dictionary with a limited set of semantic types. However, this is not quite specific enough to only identify mentions related to one disease while ignoring those related to some other disease, for example. Does cTAKES currently have a way to do this sort of filtering? Or, has anyone developed their own tools that they'd be willing to share? Thanks, Erin