Great point David! Since Tim was referring to microbial communities, the gjam package is similar to mvabund, boral etc. and the microbial example discussed in the following paper might be of interest.
https://esajournals.onlinelibrary.wiley.com/doi/full/10.1002/ecm.1241 With that being about R itself, I may go a bit off topic: In all those multivariate GLM approaches, is there a way to disentangle richness differences (or nestedness) and turnover like we can do with pairwise distances? (See the inspiring discussion between Carvalho et al. and Baselga et al.; summarized in http://onlinelibrary.wiley.com/doi/10.1111/geb.12207/abstract ) Since different biological processes may cause these patterns, separating richness differences and species turnover is of interest. Maybe the the row effect in those multivariate GLMs could be estimated as response to environmental predictors? Cheers, Torsten On Thu, 4 Apr 2019 at 01:19, David Warton <david.war...@unsw.edu.au> wrote: > Hi Tim, > Yes you are right this is an issue, BC (and other distance metrics) are > sensitive to sampling intensity, which is often an artefact of the sampling > technique. Transformation is not a great solution to the problem - it > works imperfectly and will have different effects depending on the > properties of your data. There are lots of different types of datasets out > there, each with different properties, and different behaviours under > different transformation/standardisation strategies, so there is no > one-transformation-suits-all solution. An illustration of this (in the > case of row standardisation) is in the below paper: > > https://besjournals.onlinelibrary.wiley.com/doi/10.1111/2041-210X.12843 > > The strategy I would advise here is to go a very different route and build > a statistical model for the data. You can then include row effects in the > model to handle variation in sampling intensity across rows of data (along > the lines of equation 2 of the above paper). Or if the magnitude of the > variation in sampling intensity is known (e.g. it is due to changes in > sizes of quadrats used for sampling, and quadrat size has been recorded), > then the standard approach to handle this is to add an offset to the > model. There is plenty of software out there that can fit suitable > statistical models with row effects (and offsets) for this sort of data, > including the mvabund, HMSC, boral, and gllvm packages on R. Importantly, > these packages come with diagnostic tools to check that the analysis > approach adequately captures key properties of your data - an essential > step in any analysis. > > All the best > David > > > Professor David Warton > School of Mathematics and Statistics, Evolution & Ecology Research Centre, > Centre for Ecosystem Science > UNSW Sydney > NSW 2052 AUSTRALIA > phone +61(2) 9385 7031 > fax +61(2) 9385 7123 > > http://www.eco-stats.unsw.edu.au > > > > ---------------------------------------------------------------------- > > Date: Tue, 2 Apr 2019 17:15:45 +0200 > From: Tim Richter-Heitmann <trich...@uni-bremen.de> > To: r-sig-ecology@r-project.org > Subject: [R-sig-eco] interpreting ecological distance approaches (Bray > Curtis after various data transformation) > Message-ID: <3834fea1-040a-12b5-c3a3-633e68dc6...@uni-bremen.de> > Content-Type: text/plain; charset="utf-8"; Format="flowed" > > Dear list, > > i am not an ecologist by training, so please bear with me. > > It is my understanding that Bray Curtis distances seem to be sensitive to > different community sizes. Thus, they seem to deliver inadequate results > when the different community sizes are the result of technical artifacts > rather than biology (see e.g. Weiss et al, 2017 on microbiome data). > > Therefore, i often see BC distances made on relative data (which seems to > be equivalent to the Manhattan distance) or on data which has been > subsampled to even sizes (e.g. rarefying). Sometimes i also see Bray Curtis > distances calculated on Hellinger-transformed data, > > which is the square root of relative data. This again makes sample sizes > unequal (but only to a small degree), so i wondered if this is a valid > approach, especially considering that the "natural" distance choice for > Hellinger transformed data is Euclidean (to obtain, well, the Hellinger > distance). > > Another question is what different sizes (i.e. the sums) of Hellinger > transformed communities represent? I tested some datasets, and couldnt > find a correlation between original sample sizes and their hellinger > transformed counterparts. > > Any advice is very much welcome. Thank you. > > -- > Dr. Tim Richter-Heitmann > > University of Bremen > Microbial Ecophysiology Group (AG Friedrich) > FB02 - Biologie/Chemie > Leobener Straße (NW2 A2130) > D-28359 Bremen > Tel.: 0049(0)421 218-63062 > Fax: 0049(0)421 218-63069 > > > > _______________________________________________ > R-sig-ecology mailing list > R-sig-ecology@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology > [[alternative HTML version deleted]] _______________________________________________ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology