See my responses inline.

On Fri, Aug 8, 2014 at 3:10 PM, Shivade, Chaitanya P. <
[email protected]> wrote:

>  ​Hello
>
>
>  Could you please guide me on this ?
>
>
>   Chaitanya.
>   ------------------------------
> *From:* Shivade, Chaitanya P.
> *Sent:* Friday, August 01, 2014 3:54 PM
> *To:* [email protected]; [email protected]
> *Subject:* Similarity Evaluation
>
>
>    Hi
>
>
>  I was trying to replicate the results from the paper:
>
> Pedersen, T., Pakhomov, S. V. S., Patwardhan, S., & Chute, C. G. (2007).
> Measures of semantic similarity and relatedness in the biomedical domain.
> Journal of Biomedical Informatics, 40(3), 288–99.
> doi:10.1016/j.jbi.2006.06.004
>
>
>  The paper says "We scored each of the 29 test bed pairs using each of
> the measures, and then computed the correlation between the measures’
> output and the human expert judgment scores shown in Table 1. These
> correlations are shown in Table 2."
>
>
>  I downloaded the 29 pair dataset from:
>
> http://rxinformatics.umn.edu/SemanticRelatednessResources.html
>
>
>  I used the following configuration on UMLS2013AA database
>  SAB :: include SNOMEDCT
> REL :: include PAR,CHD
>
> ​I had the following questions:
>
>
>  1. Three of the 29 pairs return a score of -1
>  C0175895 C0009814
> C0027627 C0001418​​​
> C0020473 C0027627The second and third has CUI ​C0027627 (Tumor metastasis)
> is not in SNOMEDCT.
>  But both CUIs are present in SNOMEDCT for the first pair.
>  Are these pairs kept in the dataset while calculating the correlation?
> If so, is the score -1 kept as is ?
>

In the first case -1 tells us that no path exists between the CUIs.
However, please note that the 2013 version of SNOMEDCT is significantly
different than the version used in the 2007 paper, so it may be that
different scores were obtained for these pairs. I think we got valid scores
for all the 29 pairs in the 2007 paper.  So, I think you might want to
restrict your pairs to those for which you get valid scores.  It will be
harder to compare your results directly, but given the changes in SNOMEDCT
over the years I am not sure there's a better solution.

>
>  2. What specific test was used to calculate the correlation ? I tried
> using spearman's test ignoring these three pairs (rcorr command in R) but
> got numbers that were very different from the paper.
>

The fact that you have 26 pairs versus 29 may account for that. But, the
test we used was Spearman's rank correlation. You can find what we use here
:

http://search.cpan.org/~btmcinnes/UMLS-Similarity-1.41/utils/spearmans.pl


>
> 3. Does one use the Physician score and Coder score as it is to calculate
> correlation ?
>

Yes.

I hope this helps.

Good luck!
Ted


>
>
>  It would be great if you can help me with this.
>
>  Thank you !
>  Chaitanya.
>



-- 
Ted Pedersen
http://www.d.umn.edu/~tpederse

Reply via email to