Hi, I am using R 2.9.0. It seems the documentation for the calculation of Canberra distance using stats::dist is ambiguous. Does anyone have the original definition given in the Lance & Williams paper from Aust. Comput. J. 1, 15-20, 1967?
When there are zeros at certain position in both vectors, they are not omitted as documented in the function (see below). Instead, Canberra distance is calculated as described in Frédéric Chiroleu's post ( http://tolstoy.newcastle.edu.au/R/e3/help/07/10/1370.html ) d(x,y) = (NZ + 1)/NZ * sum(abs(x-y)/(x+y)), where NZ is the number of none-zero positions. This can also be seen from the example given in the document for stats::dist (see below). However, when there is no such a position where the values are zero in both vectors, the Canberra distance is calculated using the formula given in the document. Examples: > dist(rbind(c(1,2,3,4), c(2,3,4,5)), method='canberra') 1 2 0.7873016 > dist(rbind(c(1,2,3,4,0), c(2,3,4,5,0)), method='canberra') 1 2 0.984127 > help(dist) dist package:stats R Documentation Distance Matrix Computation ...... 'canberra': sum(|x_i - y_i| / |x_i + y_i|). Terms with zero numerator and denominator are omitted from the sum and treated as if the values were missing. ## example of binary and canberra distances. x <- c(0, 0, 1, 1, 1, 1) y <- c(1, 0, 1, 1, 0, 1) dist(rbind(x,y), method= "binary") ## answer 0.4 = 2/5 dist(rbind(x,y), method= "canberra") ## answer 2 * (6/5) Thanks! -- Hongbo [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.