[R] Canberra distance

Hongbo Zhu Tue, 01 Dec 2009 13:47:44 -0800

Hi,

I am using R 2.9.0.  It seems the documentation for the calculation of
Canberra distance using stats::dist is ambiguous. Does anyone have the
original definition given in the Lance & Williams paper from Aust. Comput.
J. 1, 15-20, 1967?


When there are zeros at certain position in both vectors, they are not
omitted as documented in the function (see below). Instead, Canberra
distance is calculated as described in Frédéric Chiroleu's post (
http://tolstoy.newcastle.edu.au/R/e3/help/07/10/1370.html )
d(x,y) = (NZ + 1)/NZ * sum(abs(x-y)/(x+y)), where NZ is the number of
none-zero positions. This can also be seen from the example given in the
document for stats::dist (see below).
However, when there is no such a position where the values are zero in both
vectors, the Canberra distance is calculated using the formula given in the
document.

Examples:

> dist(rbind(c(1,2,3,4), c(2,3,4,5)), method='canberra')
          1
2 0.7873016

> dist(rbind(c(1,2,3,4,0), c(2,3,4,5,0)), method='canberra')
         1
2 0.984127


> help(dist)
dist                  package:stats                  R Documentation

Distance Matrix Computation
 ......

 'canberra': sum(|x_i - y_i| / |x_i + y_i|).  Terms with zero
          numerator and denominator are omitted from the sum and
          treated as if the values were missing.

 ## example of binary and canberra distances.
 x <- c(0, 0, 1, 1, 1, 1)
 y <- c(1, 0, 1, 1, 0, 1)
 dist(rbind(x,y), method= "binary")
 ## answer 0.4 = 2/5
 dist(rbind(x,y), method= "canberra")
 ## answer 2 * (6/5)

Thanks!
-- 
Hongbo

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Canberra distance

Reply via email to