Hello Chaitanya, Given the size and number of links between each of the nodes with the configuration file that you are using, I would suggest using the --realtime option rather than building the index. When I run experiments using the entire UMLS this is usually what I do because of space and time issues. The --realtime option will calculate the path information between the concepts on the fly rather running a DFS through the taxonomy and pre-storing path information in an index.
For the paper: Pedersen, T., Pakhomov, S. V. S., Patwardhan, S., & Chute, C. G. (2007). Measures of semantic similarity and relatedness in the biomedical domain. These experiments were done on SNOMEDCT prior to its inclusion in the UMLS and prior to the creation of the UMLS-Similarity package. To reduplicate those experiments in a subsequent paper ( http://www-users.cs.umn.edu/~bthomson/publications/btmcinnes-amia2009.pdf), we used: PAR :: include SNOMECT REL :: include PAR, CHD I hope this helps! Let us know if you have any additional questions or something isn't clear! Best regards, Bridget On Tue, Jul 8, 2014 at 2:30 PM, [email protected] [umls-similarity] <[email protected]> wrote: > > > Hi > > > I had some questions related to indexing and configurations. > > > I have tried running UMLS::Similarity with the following configuration: > > > SAB :: include MSH, RXNORM, ICD9CM, NCI, SNOMEDCT_US > REL :: include PAR, CHD > > I was running the indexing on a fairly powerful machine. (16 core CPU with > 64G RAM). I let the indexing run for a week and it occupied more than > 500G but was still running. This is understandable considering the number > of sources i have added is large and that the graph size would grow > exponentially. > > From previous threads i understand SNOMEDCT takes a day. I can definitely > afford running it more if i can add more sources. > > I wish to have more coverage of concepts and hence wish to add more > sources. What is the best compromise to achieve more sources within a > reasonable amount of time ? > > Also what is the exact configuration used for the paper > Pedersen, T., Pakhomov, S. V. S., Patwardhan, S., & Chute, C. G. (2007). > Measures of semantic similarity and relatedness in the biomedical domain. > > Your input would be very helpful. > > Chaitanya. > > > >
