Re: [umls-similarity] Practical large coverage configuration

Bridget McInnes [email protected] [umls-similarity] Tue, 08 Jul 2014 13:51:26 -0700

Hello Chaitanya,

Given the size and number of links between each of the nodes with the
configuration file that you are using, I would suggest using the --realtime
option rather than building the index. When I run experiments using the
entire UMLS this is usually what I do because of space and time issues. The
--realtime option will calculate the path information between the concepts
on the fly rather running a DFS through the taxonomy and pre-storing path
information in an index.

For the paper: Pedersen, T., Pakhomov, S. V. S., Patwardhan, S., & Chute,
C. G. (2007). Measures of semantic similarity and relatedness in the
biomedical domain. These experiments were done on SNOMEDCT prior to its
inclusion in the UMLS and prior to the creation of the UMLS-Similarity
package. To reduplicate those experiments in a subsequent paper (
http://www-users.cs.umn.edu/~bthomson/publications/btmcinnes-amia2009.pdf),
we used:

PAR :: include SNOMECT
REL :: include PAR, CHD

I hope this helps!

Let us know if you have any additional questions or something isn't clear!

Best regards,

Bridget

On Tue, Jul 8, 2014 at 2:30 PM, [email protected]
[umls-similarity] <[email protected]> wrote:

>
>
> Hi
>
>
> I had some questions related to indexing and configurations.
>
>
> I have tried running UMLS::Similarity with the following configuration:
>
>
> SAB :: include MSH, RXNORM, ICD9CM, NCI, SNOMEDCT_US
> REL :: include PAR, CHD
>
> I was running the indexing on a fairly powerful machine. (16 core CPU with
> 64G RAM). I let the indexing run for a week and it occupied more than
> 500G but was still running. This is understandable considering the number
> of sources i have added is large and that the graph size would grow
> exponentially.
>
> From previous threads i understand SNOMEDCT takes a day. I can definitely
> afford running it more if i can add more sources.
>
> I wish to have more coverage of concepts and hence wish to add more
> sources. What is the best compromise to achieve more sources within a
> reasonable amount of time ?
>
> Also what is the exact configuration used for the paper
> Pedersen, T., Pakhomov, S. V. S., Patwardhan, S., & Chute, C. G. (2007).
> Measures of semantic similarity and relatedness in the biomedical domain.
>
> Your input would be very helpful.
>
> Chaitanya.
>
>
>  
>

Re: [umls-similarity] Practical large coverage configuration

Reply via email to