Re: [umls-similarity] What is the most efficient/scalable way to compute similarity and relatedness on a big dataset?

Ted Pedersen [email protected] [umls-similarity] Tue, 27 Aug 2019 14:44:34 -0700

This issue of how to compute a lot of similarity pairs has come up in the
past. The mail archive may have some ideas, although I'm not aware of a
really good solution that has come to light...one example from the archive
and there are I think others...


https://www.mail-archive.com/[email protected]/msg00229.html

In general vector and lesk are quite a bit slower than the other measures,
although none would be considered fast I am sure...

It would be nice to hear if other users have had some experience with this
- I've generally not done experiments with huge numbers of pairs and so
don't have a lot of first hand knowledge to pass on.

On Tue, Aug 27, 2019 at 4:20 PM [email protected] [umls-similarity] <
[email protected]> wrote:

>
>
> Hello. I've succeeded in activating umls-similarity.pl from the shell.
>
>
> I want to use and cite this project in a research paper I'm writing, so I
> want to compute similarity and relatedness on a big dataset that I have
> (can get to > million pairs)
>
>
> The current problem is that activating it from the shell taking too long
> (even after building the index at the first run)
>
>
> My questions are:
>
> 1. What is the best way to calculate similarity and relatedness on a big
> dataset?
>
> 2. Is it possible to compute similarity and relatedness on a big dataset?
>
> 3. Does anyone have code examples for performing this job or similar job?
> For example calculating similarity and relatedness on pairs of words in
> some format (csv, txt, xlsx, json, etc..)... I'm not familiar with perl but
> I'll learn whatever needed in order to activate this code.
>
>
> Thanks a-lot!
>
> 
>

Re: [umls-similarity] What is the most efficient/scalable way to compute similarity and relatedness on a big dataset?

Reply via email to