[umls-similarity] Practical large coverage configuration

[email protected] [umls-similarity] Tue, 08 Jul 2014 12:40:14 -0700

Hi
 

 I had some questions related to indexing and configurations.


 I have tried running UMLS::Similarity with the following configuration:
 

 SAB :: include MSH, RXNORM, ICD9CM, NCI, SNOMEDCT_US

 REL :: include PAR, CHD

 

 
 I was running the indexing on a fairly powerful machine. (16 core CPU with 64G 
RAM). I let the indexing run for a week and it occupied more than 500G but was 
still running. This is understandable considering the number of sources i have 
added is large and that the graph size would grow exponentially.
 

 From previous threads i understand SNOMEDCT takes a day. I can definitely 
afford running it more if i can add more sources.

 

 I wish to have more coverage of concepts and hence wish to add more sources. 
What is the best compromise to achieve more sources within a reasonable amount 
of time ?
 

 Also what is the exact configuration used for the paper
 Pedersen, T., Pakhomov, S. V. S., Patwardhan, S., & Chute, C. G. (2007). 
Measures of semantic similarity and relatedness in the biomedical domain.

 

 Your input would be very helpful.
 

 Chaitanya.

[umls-similarity] Practical large coverage configuration

Reply via email to