This is cetainly ancient R history. The essence of the formula was last changed
-           dist += fabs(x[i1] - x[i2])/(x[i1] + x[i2]);
+           dist += fabs(x[i1] - x[i2])/fabs(x[i1] + x[i2]);

in October 1998.  The help page description came later.

The
           dist += fabs(x[i1] - x[i2])/(x[i1] + x[i2]);
form was there as 'canberra' in the first CVS archive in September 1997 (as src/library/mva/src/dist.c) so it looks like one of R&R was the original author and this could be called pre-history.

On Sun, 7 Feb 2010, bill.venab...@csiro.au wrote:

That is interesting.  The first of these, namely

sum(|x_i - y_i|) / sum(x_i + y_i)

is now better known in ecology as the Bray-Curtis distance.  Even more interesting is the 
typo in Henry & Stevens "A Primer of Ecology in R" where the Bray Curtis 
distance formula is actually the Canberra distance  (Eq. 10.2 p. 289).  There seems to be a 
certain slipperiness of definition in this field.

What surprises me most is why ecologists still cling to this way of doing 
things,  It is one of the few places I know of where the analysis is justified 
purely heuristically and not from any kind of explicit model for the ecological 
processes under study.

Bill Venables.



________________________________________
From: r-devel-boun...@r-project.org [r-devel-boun...@r-project.org] On Behalf 
Of Duncan Murdoch [murd...@stats.uwo.ca]
Sent: 07 February 2010 03:00
To: genol...@u-paris10.fr
Cc: r-devel@r-project.org
Subject: Re: [Rd] Canberra distance

On 06/02/2010 11:31 AM, Christophe Genolini wrote:
The definition I use is the on find in the book "Cluster analysis" by
Brian Everitt, Sabine Landau and Morven Leese.
They cite, as definition paper for Canberra distance, an article of
Lance and Williams "Computer programs for hierarchical polythetic
classification" Computer Journal 1966.
I do not have access, but here is the link :
http://comjnl.oxfordjournals.org/cgi/content/abstract/9/1/60
Hope this helps.


I do have access to that journal, and that paper gives the definition

sum(|x_i - y_i|) / sum(x_i + y_i)

and suggests the variation

sum( [|x_i - y_i|) / (x_i + y_i) ] )

It doesn't call either one the Canberra distance; it calls the first one
the "non-metric coefficient" and doesn't name the second.  (I imagine
the Canberra name came from the fact that the authors were at CSIRO in
Canberra.)

So I'd agree your definition is better, but I don't know if it can
really be called the "Canberra distance".

Duncan Murdoch

Christophe
On 06/02/2010 10:39 AM, Christophe Genolini wrote:
Hi the list,

According to what I know, the Canberra distance between X et Y is :
sum[ (|x_i - y_i|) / (|x_i|+|y_i|) ] (with | | denoting the function
'absolute value')
In the source code of the canberra distance in the file distance.c,
we find :

    sum = fabs(x[i1] + x[i2]);
    diff = fabs(x[i1] - x[i2]);
    dev = diff/sum;

which correspond to the formula : sum[ (|x_i - y_i|) / (|x_i+y_i|) ]
(note that this does not define a distance... This is correct when
x_i and y_i are positive, but not when a value is negative.)

Is it on purpose or is it a bug?
It matches the documentation in ?dist, so it's not just a coding
error.  It will give the same value as your definition if the two
items have the same sign (not only both positive), but different
values if the signs differ.

The first three links I found searching Google Scholar for "Canberra
distance" all define it only for non-negative data.  One of them gave
exactly the R formula (even though the absolute value in the
denominator is redundant), the others just put x_i + y_i in the
denominator.

None of the 3 papers cited the origin of the definition, so I can't
tell you who is wrong.

Duncan Murdoch



______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


--
Brian D. Ripley,                  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to