This issue of how to compute a lot of similarity pairs has come up in the past. The mail archive may have some ideas, although I'm not aware of a really good solution that has come to light...one example from the archive and there are I think others...
https://www.mail-archive.com/[email protected]/msg00229.html In general vector and lesk are quite a bit slower than the other measures, although none would be considered fast I am sure... It would be nice to hear if other users have had some experience with this - I've generally not done experiments with huge numbers of pairs and so don't have a lot of first hand knowledge to pass on. On Tue, Aug 27, 2019 at 4:20 PM [email protected] [umls-similarity] < [email protected]> wrote: > > > Hello. I've succeeded in activating umls-similarity.pl from the shell. > > > I want to use and cite this project in a research paper I'm writing, so I > want to compute similarity and relatedness on a big dataset that I have > (can get to > million pairs) > > > The current problem is that activating it from the shell taking too long > (even after building the index at the first run) > > > My questions are: > > 1. What is the best way to calculate similarity and relatedness on a big > dataset? > > 2. Is it possible to compute similarity and relatedness on a big dataset? > > 3. Does anyone have code examples for performing this job or similar job? > For example calculating similarity and relatedness on pairs of words in > some format (csv, txt, xlsx, json, etc..)... I'm not familiar with perl but > I'll learn whatever needed in order to activate this code. > > > Thanks a-lot! > > >
