Hi Tim,
Yes you are right this is an issue, BC (and other distance metrics) are 
sensitive to sampling intensity, which is often an artefact of the sampling 
technique.  Transformation is not a great solution to the problem - it works 
imperfectly and will have different effects depending on the properties of your 
data.  There are lots of different types of datasets out there, each with 
different properties, and different behaviours under different 
transformation/standardisation strategies, so there is no 
one-transformation-suits-all solution.  An illustration of this (in the case of 
row standardisation) is in the below paper:
        https://besjournals.onlinelibrary.wiley.com/doi/10.1111/2041-210X.12843

The strategy I would advise here is to go a very different route and build a 
statistical model for the data.  You can then include row effects in the model 
to handle variation in sampling intensity across rows of data (along the lines 
of equation 2 of the above paper).  Or if the magnitude of the variation in 
sampling intensity is known (e.g. it is due to changes in sizes of quadrats 
used for sampling, and quadrat size has been recorded), then the standard 
approach to handle this is to add an offset to the model.  There is plenty of 
software out there that can fit suitable statistical models with row effects 
(and offsets) for this sort of data, including the mvabund, HMSC, boral, and 
gllvm packages on R.  Importantly, these packages come with diagnostic tools to 
check that the analysis approach adequately captures key properties of your 
data - an essential step in any analysis.

All the best
David


Professor David Warton
School of Mathematics and Statistics, Evolution & Ecology Research Centre, 
Centre for Ecosystem Science
UNSW Sydney
NSW 2052 AUSTRALIA
phone +61(2) 9385 7031
fax +61(2) 9385 7123
 
http://www.eco-stats.unsw.edu.au



----------------------------------------------------------------------

Date: Tue, 2 Apr 2019 17:15:45 +0200
From: Tim Richter-Heitmann <trich...@uni-bremen.de>
To: r-sig-ecology@r-project.org
Subject: [R-sig-eco] interpreting ecological distance approaches (Bray
        Curtis after various data transformation)
Message-ID: <3834fea1-040a-12b5-c3a3-633e68dc6...@uni-bremen.de>
Content-Type: text/plain; charset="utf-8"; Format="flowed"

Dear list,

i am not an ecologist by training, so please bear with me.

It is my understanding that Bray Curtis distances seem to be sensitive to 
different community sizes. Thus, they seem to deliver inadequate results when 
the different community sizes are the result of technical artifacts rather than 
biology (see e.g. Weiss et al, 2017 on microbiome data).

Therefore, i often see BC distances made on relative data (which seems to be 
equivalent to the Manhattan distance) or on data which has been subsampled to 
even sizes (e.g. rarefying). Sometimes i also see Bray Curtis distances 
calculated on Hellinger-transformed data,

which is the square root of relative data. This again makes sample sizes 
unequal (but only to a small degree), so i wondered if this is a valid 
approach, especially considering that the "natural" distance choice for 
Hellinger transformed data is Euclidean (to obtain, well, the Hellinger 
distance).

Another question is what different sizes (i.e. the sums) of Hellinger 
transformed  communities represent? I tested some datasets, and couldnt find a 
correlation between original sample sizes and their hellinger transformed 
counterparts.

Any advice is very much welcome. Thank you.

--
Dr. Tim Richter-Heitmann

University of Bremen
Microbial Ecophysiology Group (AG Friedrich)
FB02 - Biologie/Chemie
Leobener Straße (NW2 A2130)
D-28359 Bremen
Tel.: 0049(0)421 218-63062
Fax: 0049(0)421 218-63069



_______________________________________________
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Reply via email to