Hello, I have just realized in the original paper, the t test is defined as: `t = h1-h2 -(µ1µ2)/(var1-var2)^1/2`. is the term -(µ1µ2) missing in your formula? How to calculate µ1µ2? Thank you
On Thu, Sep 10, 2020 at 2:41 PM Luigi Marongiu <marongiu.lu...@gmail.com> wrote: > > Update: > I also added the confidence interval for the Shannon index: > ``` > #! Hutcheson's t-test for Shannon diversity equality > # thanks to Karl Schilling and Rui Barradas > hutcheson = function(A, B){ > # compute Shannon index, variance and sum of elements > A_index <- Shannon(A) > B_index <- Shannon(B) > A_var <- ShannonVar(A) > B_var <- ShannonVar(B) > A_sum <- sum(A) > B_sum <- sum(B) > # compute Hutcheson > DF <- (A_var + B_var)^2 /(A_var^2/A_sum + B_var^2/B_sum) > test <- (A_index-B_index) /sqrt(A_var + B_var) > p <- 2*pt(-abs(test),df= DF) > if (p < 0.001) { > P = "<0.001" > } else { > P = round(p, 3) > } > if (p < 0.001) { > S = "***" > } else if (p < 0.01) { > S = "**" > } else if (p < 0.05) { > S = "*" > } else { > S = "" > } > # closure > cat("Hutcheson's t-test for Shannon diversity equality\n\tShannon > index first group: \t", > round(A_index, 3), " (", round((A_index-2*A_var),3), "-", > round((A_index+2*A_var),3), > ")\n\tShannon index second group: \t", > round(B_index, 3), " (", round((B_index-2*B_var),3), "-", > round((B_index+2*B_var),3), > ")\n\tp-value: ", P, " ", S, "\n", sep = "") > return(p) > } > ``` > > On Thu, Sep 10, 2020 at 2:10 PM Luigi Marongiu <marongiu.lu...@gmail.com> > wrote: > > > > Hello, > > thank you for the code. To explain better, when I used vegan, I did > > not count the species directly but simply prepared a dataframe where, > > for each species, I counted the number of samples bearing such > > species: > > ``` > > > > > str(new_df) > > 'data.frame': 3 obs. of 46 variables: > > $ NC_001416 Enterobacteria phage lambda : int 5 4 5 > > $ NC_001623 Autographa californica nucl...: int 7 7 7 > > $ NC_001895 Enterobacteria phage P2 : int 1 0 0 > > $ NC_004745 Yersinia phage L-413C : int 1 0 0 > > ``` > > here the triplettes refer to healthy, tumor and metastasis. The outcome is: > > ``` > > # Shannon index > > diversity(new_df) > > #> Normal Tumour Metastasis > > #> 2.520139 3.109512 1.890404 > > ``` > > Using iNext, I provided a list of all the species counted in a samples > > ``` > > > new_list > > $Healthy > > [1] 5 7 1 1 1 8 1 1 2 1 2 1 1 1 1 2 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > 0 0 0 0 0 0 0 0 0 0 0 0 0 > > > > $Tumour > > [1] 4 7 0 0 0 7 0 0 1 0 1 0 0 0 0 2 0 0 1 1 1 1 1 2 1 2 1 1 1 1 1 1 1 > > 1 2 1 1 1 1 1 1 1 0 0 0 0 > > > > $Metastasis > > [1] 5 7 0 0 0 9 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > > 0 0 0 0 0 0 1 0 0 1 1 1 1 > > ``` > > From this I get: > > ``` > > mod = iNEXT(new_list, q=0, datatype="abundance") > > mod$AsyEst > > #Site Diversity Observed Estimator s.e. LCL UCL > > #1 Normal Species richness 18.000 41.368 19.683 23.563 116.155 > > #2 Normal Shannon diversity 12.430 21.343 5.183 12.430 31.501 > > #4 Tumour Species richness 30.000 94.776 42.936 49.848 241.396 > > #5 Tumour Shannon diversity 22.410 53.135 14.486 24.743 81.526 > > #7 Metastasis Species richness 10.000 27.379 22.821 12.443 133.640 > > #8 Metastasis Shannon diversity 6.622 9.980 3.102 6.622 16.059 > > ``` > > So here the Shannon index is 12 instead of 2.5... > > Using Karl's function, I get: > > ``` > > # compute Shannon > > norm_sIdx <- Shannon(array(as.numeric(unlist(new_list[1])))) > > canc_sIdx <- Shannon(array(as.numeric(unlist(new_list[2])))) > > meta_sIdx <- Shannon(array(as.numeric(unlist(new_list[3])))) > > norm_var <- ShannonVar(array(as.numeric(unlist(new_list[1])))) > > canc_var <- ShannonVar(array(as.numeric(unlist(new_list[2])))) > > meta_var <- ShannonVar(array(as.numeric(unlist(new_list[3])))) > > norm_sum <- sum(array(as.numeric(unlist(new_list[1])))) > > canc_sum <- sum(array(as.numeric(unlist(new_list[2])))) > > meta_sum <- sum(array(as.numeric(unlist(new_list[3])))) > > # compute Hutcheson > > degfree <- (norm_var + canc_var)**2 /(norm_var**2/norm_sum + > > canc_var**2 /canc_sum) > > test <- (norm_sIdx-canc_sIdx) /sqrt(norm_var + canc_var) > > (p <- 2*pt(-abs(test),df= degree)) > > > [1] 0.01825784 > > ``` > > remarkably, the indices are the same as obtained by vegan: > > ``` > > > norm_sIdx > > [1] 2.520139 > > > canc_sIdx > > [1] 3.109512 > > > meta_sIdx > > [1] 1.890404 > > ``` > > > > I tried Rui's function but I got an error, so I wrote it as > > ``` > > hutcheson = function(A, B){ > > # compute Shannon index, variance and sum of elements > > A_index <- Shannon(A) > > B_index <- Shannon(B) > > A_var <- ShannonVar(A) > > B_var <- ShannonVar(B) > > A_sum <- sum(A) > > B_sum <- sum(B) > > # compute Hutcheson > > DF <- (A_var + B_var)^2 /(A_var^2/A_sum + B_var^2/B_sum) > > test <- (A_index-B_index) /sqrt(A_var + B_var) > > p <- 2*pt(-abs(test),df= DF) > > # closure > > cat("Hutcheson's t-test for Shannon diversity equality\n\tShannon > > index first group: ", > > round(A_index, 4), "\n\tShannon index second group: ", round(B_index, > > 4), > > "\n\tp-value : ", round(p, 4), "\n", sep = "") > > return(p) > > } > > ``` > > and I got: > > ``` > > > > > n_t = hutcheson(array(as.numeric(unlist(new_list[1]))), > > > array(as.numeric(unlist(new_list[2])))) > > Hutcheson's t-test for Shannon diversity equality > > Shannon index first group: 2.5201 > > Shannon index second group: 3.1095 > > p-value : 0.0183 > > > n_m = hutcheson(array(as.numeric(unlist(new_list[1]))), > > > array(as.numeric(unlist(new_list[3])))) > > Hutcheson's t-test for Shannon diversity equality > > Shannon index first group: 2.5201 > > Shannon index second group: 1.8904 > > p-value : 0.0371 > > > t_m = hutcheson(array(as.numeric(unlist(new_list[2]))), > > > array(as.numeric(unlist(new_list[3])))) > > Hutcheson's t-test for Shannon diversity equality > > Shannon index first group: 3.1095 > > Shannon index second group: 1.8904 > > p-value : 0 > > ``` > > new_list[1]|[2]|[3] refer to healthy, tumor and metastasis. applied to > > the original Hutcheson data: > > ``` > > > bird_1956 <- c(4,4,190,135,56,3,2,2,1,12,41,201,1,0,131,3) > > > bird_1957 <- c(4,111,53,66,146,222,126,61,0,2323,21) > > > hutcheson(bird_1956, bird_1957) > > Hutcheson's t-test for Shannon diversity equality > > Shannon index first group: 1.8429 > > Shannon index second group: 1.0689 > > p-value : 0 > > > > ``` > > This is to compare two groups at the time. I'll probably have to > > compensate for multiple testing... > > But if this all OK, then the case is closed. > > Thank you > > > > On Thu, Sep 10, 2020 at 1:04 PM Rui Barradas <ruipbarra...@sapo.pt> wrote: > > > > > > Hello, > > > > > > Sorry, there's an instruction missing. See inline. > > > > > > Às 11:44 de 10/09/20, Rui Barradas escreveu: > > > > If you want a function automating Karl's code, here it is. It returns an > > > > object of S3 class "htest", R's standard for hypothesis tests functions. > > > > The returned object can then be subset in the usual ways, ht$statistic, > > > > ht$parameter, ht$p.value, etc. > > > > > > > > > > > > library(QSutils) > > > > > > > > hutcheson.test <- function(x1, x2){ > > > > dataname1 <- deparse(substitute(x1)) > > > > dataname2 <- deparse(substitute(x2)) > > > > method <- "Hutcheson's t-test for Shannon diversity equality" > > > > alternative <- "the diversities of the two samples are not equal" > > > > h1 <- Shannon(x1) > > > > varh1 <- ShannonVar(x1) > > > > n1 <- sum(x1) > > > > h2 <- Shannon(x2) > > > > varh2 <- ShannonVar(x2) > > > > n2 <- sum(x2) > > > > degfree <- (varh1 + varh2)**2 / (varh1**2/n1 + varh2**2/n2) > > > > tstat <- (h1 - h2)/sqrt(varh1 + varh2) > > > > p.value <- 2*pt(-abs(tstat), df = degfree) > > > > ht <- list( > > > > statistic = c(t = tstat), > > > > parameter = c(df = degfree), > > > > p.value = p.value, > > > > alternative = alternative, > > > > method = method, > > > > data.name = paste(dataname1, dataname2, sep = ", ") > > > > ) > > > > class(ht) <- "htest" > > > > ht > > > > } > > > > > > > > earlier <- c(0,0,146,0,5,46,0,1,295,0,0,3,0,0,0,0,0) > > > > later <- c(0,0,142,0,5,46,0,1,246,0,0,3,0,0,0,0,0) > > > > > > > > hutcheson.test(earlier, later) > > > > > > > > > > > > > > > > With the data you provided: > > > > > > > > > > > > bird_1956 <- c(4,4,190,135,56,3,2,2,1,12,41,201,1,0,131,3) > > > > bird_1957 <- c(4,111,53,66,146,222,126,61,0,2323,21) > > > > bird_1958 <- c(0,3,32,228,56,102,0,11,2,220,0) > > > > bird_1959 <- c(0,0,14,59,26,68,0) > > > > bird_1960 <- c(0,0,73,66,71,25,0,109,63,1) > > > > > > > > hutcheson.test(bird_1956, bird_1957) > > > > > > > > > > > > > > > > > > > > Note that like David said earlier, there might be better ways to > > > > interpret Shannon's diversity index. If h is the sample's diversity, > > > > exp(h) gives the number of equally-common species with equivalent > > > > diversity. > > > > > > > > > > > > s1 <- Shannon(earlier) > > > > s2 <- Shannon(later) > > > > c(earlier = s1, later = s2) > > > > exp(c(earlier = s1, later = s2)) # Both round to 3 > > > > eq_common <- rep(1, 3) # Can be 1 specimen or any other > > > > number > > > > Shannon(eq_common) # Slightly greater than the samples' > > > > diversity > > > > > > > > > > > > > > # Create a list with all the data > > > birds <- mget(ls(pattern = "^bird")) > > > > > > > round(exp(sapply(birds, Shannon))) # Your data > > > > > > > > > Hope this helps, > > > > > > Rui Barradas > > > > > > > > > > > > > > > #------------------------------------- > > > > > > > > > > > > Earlier Karl wrote [1] that > > > > > > > > > > > > Here var(h) is calculated as in ref 1 cited by Rui Barradas - I guess > > > > that explains the minor numerical differences obtained with the code > > > > above and the published variances. > > > > > > > > > > > > I don't believe the published variances were computed with the published > > > > variance estimator. The code below computes the variances like QSutils > > > > and with formula (4) in Hutcheson's paper. The latter does not give the > > > > same results. > > > > > > > > var_est <- function(n){ > > > > s <- length(n) > > > > N <- sum(n) > > > > p <- n/N > > > > i <- p != 0 > > > > inv.p <- numeric(s) > > > > inv.p[i] <- 1/p[i] > > > > log.p <- numeric(s) > > > > log.p[i] <- log(p[i]) > > > > # > > > > term1 <- (sum(p * log.p^2) - sum(p * log.p)^2)/N > > > > term2 <- (s - 1)/(2*N^2) > > > > # > > > > numer3 <- -1 + sum(inv.p) - sum(inv.p * log.p) + sum(inv.p)*sum(p * > > > > log.p) > > > > denom3 <- 6*N^3 > > > > term3 <- numer3/denom3 > > > > list( > > > > Bioc = term1 + term2, > > > > Hutch = term1 + term2 + term3 > > > > ) > > > > } > > > > > > > > Vh1 <- var_est(earlier) > > > > Vh1 > > > > all.equal(ShannonVar(earlier), Vh1$Bioc) > > > > ShannonVar(earlier) - Vh1$Bioc # FAQ 7.31 > > > > > > > > Vh2 <- var_est(later) > > > > Vh2 > > > > identical(ShannonVar(later), Vh2$Bioc) # TRUE > > > > > > > > > > > > > > > > [1] https://stat.ethz.ch/pipermail/r-help/2020-September/468664.html > > > > > > > > > > > > Hope this helps, > > > > > > > > Rui Barradas > > > > > > > > > > > > Às 09:38 de 10/09/20, Luigi Marongiu escreveu: > > > >> Update: > > > >> I can see that you used the function Shannon from the package QSutils. > > > >> This would supplement the iNext package I used and solve the problem. > > > >> Thank you. > > > >> > > > >> On Thu, Sep 10, 2020 at 10:35 AM Luigi Marongiu > > > >> <marongiu.lu...@gmail.com> wrote: > > > >>> > > > >>> Thank you very much for the code, that was very helpful. > > > >>> I got the article by Hutcheson -- I don't know if I can distribute it > > > >>> , given the possible copyrights, or if I can attach it here -- but it > > > >>> does not report numbers directly: it refers to a previous article > > > >>> counting bird death on a telegraph each year. The numbers > > > >>> are: > > > >>> bird_1956 <- c(4,4,190,135,56,3,2,2,1,12,41,201,1,0,131,3) > > > >>> bird_1957 <- c(4,111,53,66,146,222,126,61,0,2323,21) > > > >>> bird_1958 <- c(0,3,32,228,56,102,0,11,2,220,0) > > > >>> bird_1959 <- c(0,0,14,59,26,68,0) > > > >>> bird_1960 <- c(0,0,73,66,71,25,0,109,63,1) > > > >>> > > > >>> This for sake of the argument. > > > >>> As for my problem, I implemented the Shannon index with the package > > > >>> iNext, which only gives me the index itself and the 95% CI. Even when > > > >>> I implemented it with vegan, I only got the index. Essentially I don't > > > >>> have a count of species I could feed into the Hutcheson's. Is there a > > > >>> way to extract these data? Or to run a Hutcheson's on the final index? > > > >>> Thank you > > > >>> > > > >>> On Tue, Sep 8, 2020 at 7:43 PM Karl Schilling > > > >>> <karl.schill...@uni-bonn.de> wrote: > > > >>>> > > > >>>> Dear Luigi, > > > >>>> > > > >>>> below some code I cobbled together based on the Hutcheson paper you > > > >>>> mentioned. I was lucky to find code to calculate h and, importantly, > > > >>>> its > > > >>>> variance in the R-package QSutils - you may find it on the > > > >>>> Bioconductor > > > >>>> website. > > > >>>> > > > >>>> here is the code, along with an example. I also attach the code as an > > > >>>> R-file. > > > >>>> > > > >>>> Hope that helps. > > > >>>> All my best > > > >>>> > > > >>>> Karl > > > >>>> PS don't forget to adjust for multiple testing if you compare more > > > >>>> than > > > >>>> two groups. > > > >>>> K > > > >>>> > > > >>>> > > > >>>> # load package needed > > > >>>> # QSutils is on Bioconductor > > > >>>> library(QSutils) > > > >>>> > > > >>>> # here some exemplary data - these are the data from Pilou 1966 that > > > >>>> are > > > >>>> used > > > >>>> # in the second example of Hutcheson, J theor Biol 129:151-154 (1970) > > > >>>> > > > >>>> earlier <- c(0,0,146,0,5,46,0,1,295,0,0,3,0,0,0,0,0) > > > >>>> later <- c(0,0,142,0,5,46,0,1,246,0,0,3,0,0,0,0,0) > > > >>>> # numbers of the first example used by Hutcheson were unfortunately > > > >>>> not > > > >>>> # available to me > > > >>>> > > > >>>> # here starts the code ; you may replace the variables "earlier" and > > > >>>> "later" > > > >>>> # by your own numbers. > > > >>>> > > > >>>> # calculate h, var(h) etc > > > >>>> h1 <- Shannon(earlier) > > > >>>> varh1 <- ShannonVar(earlier) > > > >>>> n1 <- sum (earlier) > > > >>>> h2 <- Shannon(later) > > > >>>> varh2 <- ShannonVar(later) > > > >>>> n2 <- sum (later) > > > >>>> degfree <- (varh1 + varh2)**2 /(varh1**2/n1 + varh2**2 /n2) > > > >>>> > > > >>>> # compare numbers with those in the paper > > > >>>> h1 > > > >>>> h2 > > > >>>> varh1 > > > >>>> varh2 > > > >>>> # I assume that minor numerical differences are due to differences > > > >>>> in the > > > >>>> # numerical precision of computers in the early seventies and today > > > >>>> / KS > > > >>>> > > > >>>> # this is the actual t-test > > > >>>> t <- (h1-h2) /sqrt(varh1 + varh2) > > > >>>> p <- 2*pt(-abs(t),df= degfree) > > > >>>> p > > > >>>> > > > >>>> # that's it > > > >>>> # Best > > > >>>> # Karl > > > >>>> -- > > > >>>> Karl Schilling, MD > > > >>>> Professor of Anatomy and Cell Biology > > > >>>> Anatomisches Institut > > > >>>> Rheinische Friedrich-Wilhelms-Universität > > > >>>> Nussallee 10 > > > >>>> > > > >>>> D-53115 Bonn > > > >>>> Germany > > > >>>> > > > >>>> phone ++49-228-73-2602 > > > >>>> > > > >>> > > > >>> > > > >>> -- > > > >>> Best regards, > > > >>> Luigi > > > >> > > > >> > > > >> > > > > > > > > ______________________________________________ > > > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > > PLEASE do read the posting guide > > > > http://www.R-project.org/posting-guide.html > > > > and provide commented, minimal, self-contained, reproducible code. > > > > > > > > -- > > Best regards, > > Luigi > > > > -- > Best regards, > Luigi -- Best regards, Luigi ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.