[R-sig-eco] Mantel test with multiple imputation, P values
Dear List, I have an environmental data set with several missing values that I am trying to relate to a community structure data set using a mantel test. One solution to the missing data problem seems to be multiple imputation; I am using the Amelia package. This generates several (five in this example) imputed data sets. I can run mantel on each of these and come up with five similar but not identical solutions. I figure I can average the mantel rho values. However, I am not sure what to do about the P values. From looking around online, it looks like I shouldn't take the average of p values. I found this reference < http://missingdata.lshtm.ac.uk/index.php?option=com_content&view=article&id=164:combining-p-values-from-multiple-imputations&catid=57:multiple-imputation&Itemid=98> that seems to have promising suggestions, but I can't seem to figure out how I'd implement any of these in R. I was hoping somebody might have a suggestion for how I could combine my p values. One option, I think would be to take the highest (worst) p value (in the example below, this would be p = 0.012). However for large numbers of imputations, I am believe that this method might be to conservative. Another option might be to take the p value corresponding to the median rho score (in the example below this would be p =0.008). Thoughts? -Jacob ##Example Code Below require(Amelia) require(vegan) require(ecodist) ##Species data matrix with environmental data that are missing some values. data(varespec) data(varechem) varechem.missings <- varechem[,c("N", "P", "K")] varechem.missings[c(1,5, 7, 15, 20),1] <- NA varechem.missings[c(1,2, 9, 21), 2] <- NA #I multiply impute the missing values with the Amelia package imps <- 5 #imps <- 25 varechem.amelia <- amelia(varechem.missings, m = imps) #for each imputation of the environmental data I run a mantel test and save #the results to mresults mresults <- NULL for(i in 1:imps){ varespec.dist <- vegdist(varespec) varechem.am.dist <- dist(varechem.amelia$imputations[[i]]) mresults <- rbind(mresults, (ecodist::mantel(varespec.dist~varechem.am.dist))) } mresults ##mantelr pval1 pval2 pval3 llim.2.5% ulim.97.5% ## [1,] 0.2137656 0.008 0.993 0.008 0.1015176 0.3389979 ## [2,] 0.2162388 0.011 0.990 0.011 0.1207528 0.3346554 ## [3,] 0.2149556 0.012 0.989 0.016 0.1319943 0.3279028 ## [4,] 0.2101820 0.009 0.992 0.012 0.1217293 0.3288272 ## [5,] 0.2135279 0.006 0.995 0.006 0.1130386 0.3359864 #based on these results what would be a reasonable p value to report for the environmental parameters relating to the community structure? ##end example [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Mantel test with multiple imputation, P values
Hi Jacob, comments below. On Aug 18, 2013 2:31 PM, "Jacob Cram" wrote: > > Dear List, > I have an environmental data set with several missing values that I > am trying to relate to a community structure data set using a mantel > test. One solution to the missing data problem seems to be multiple > imputation; I am using the Amelia package. This generates several (five in > this example) imputed data sets. I can run mantel on each of these and > come up with five similar but not identical solutions. I figure I can > average the mantel rho values. However, I am not sure what to do about the > P values. From looking around online, it looks like I shouldn't take the > average of p values. I found this reference < > http://missingdata.lshtm.ac.uk/index.php?option=com_content&view=article&id=164:combining-p-values-from-multiple-imputations&catid=57:multiple-imputation&Itemid=98 > > that seems to have promising suggestions, but I can't seem to figure out > how I'd implement any of these in R. So following that link and reading the Li et al. reference it looks as though the procedure is well described at the top of page 71. You get your parameter estimate from the usual procedure. The test statistic, written as "D", is the distance between the null value and the estimated value with some scaling specified in eq. 1.17. They use the F distribution and k and m (the number of imputations) degrees of freedom. I don't think you need to reinvent some inferior ecologists-only procedure for this. Krzysztof I was hoping somebody might have a > suggestion for how I could combine my p values. One option, I think would > be to take the highest (worst) p value (in the example below, this would be > p = 0.012). However for large numbers of imputations, I am believe that > this method might be to conservative. Another option might be to take the > p value corresponding to the median rho score (in the example below this > would be p =0.008). Thoughts? > -Jacob > > > ##Example Code Below > require(Amelia) > require(vegan) > require(ecodist) > > ##Species data matrix with environmental data that are missing some values. > data(varespec) > data(varechem) > varechem.missings <- varechem[,c("N", "P", "K")] > varechem.missings[c(1,5, 7, 15, 20),1] <- NA > varechem.missings[c(1,2, 9, 21), 2] <- NA > > #I multiply impute the missing values with the Amelia package > imps <- 5 > #imps <- 25 > varechem.amelia <- amelia(varechem.missings, m = imps) > > > #for each imputation of the environmental data I run a mantel test and save > #the results to mresults > mresults <- NULL > > for(i in 1:imps){ > varespec.dist <- vegdist(varespec) > varechem.am.dist <- dist(varechem.amelia$imputations[[i]]) > mresults <- rbind(mresults, > (ecodist::mantel(varespec.dist~varechem.am.dist))) > } > > mresults > > ##mantelr pval1 pval2 pval3 llim.2.5% ulim.97.5% > ## [1,] 0.2137656 0.008 0.993 0.008 0.1015176 0.3389979 > ## [2,] 0.2162388 0.011 0.990 0.011 0.1207528 0.3346554 > ## [3,] 0.2149556 0.012 0.989 0.016 0.1319943 0.3279028 > ## [4,] 0.2101820 0.009 0.992 0.012 0.1217293 0.3288272 > ## [5,] 0.2135279 0.006 0.995 0.006 0.1130386 0.3359864 > > #based on these results what would be a reasonable p value to report for > the environmental parameters relating to the community structure? > > ##end example > > [[alternative HTML version deleted]] > > ___ > R-sig-ecology mailing list > R-sig-ecology@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Mantel test with multiple imputation, P values
Krzystof, Thank you for your reply. It is promising that you think that Li's method should work for imputed mantel results. That said, I am a bit over my head with the math in the Li et al reference. Could you (or anyone on this list) provide an R-code example of how D would be calculated from the output of several mantel tests? Also, and forgive my ignorance if this statement is coming from the wrong direction, It is my understanding that mantel P values are generally calculated by actually permuting the rows of the similarity matrix and re-running the mantel test a given number of times (usually 1000). Accordingly, as far as I can tell there is no explicitly generated F value corresponding to a mantel p value. It seems like Li's method assumes I am generating P from an F table. Would it be appropriate to back calculate F from P, k and m? Thanks again, Jacob On Sun, Aug 18, 2013 at 2:19 PM, Krzysztof Sakrejda < krzysztof.sakre...@gmail.com> wrote: > Hi Jacob, comments below. > > On Aug 18, 2013 2:31 PM, "Jacob Cram" wrote: > > > > Dear List, > > I have an environmental data set with several missing values that I > > am trying to relate to a community structure data set using a mantel > > test. One solution to the missing data problem seems to be multiple > > imputation; I am using the Amelia package. This generates several (five > in > > this example) imputed data sets. I can run mantel on each of these and > > come up with five similar but not identical solutions. I figure I can > > average the mantel rho values. However, I am not sure what to do about > the > > P values. From looking around online, it looks like I shouldn't take the > > average of p values. I found this reference < > > > http://missingdata.lshtm.ac.uk/index.php?option=com_content&view=article&id=164:combining-p-values-from-multiple-imputations&catid=57:multiple-imputation&Itemid=98 > > > > that seems to have promising suggestions, but I can't seem to figure out > > how I'd implement any of these in R. > > So following that link and reading the Li et al. reference it looks as > though the procedure is well described at the top of page 71. You get your > parameter estimate from the usual procedure. The test statistic, written as > "D", is the distance between the null value and the estimated value with > some scaling specified in eq. 1.17. They use the F distribution and k and m > (the number of imputations) degrees of freedom. I don't think you need to > reinvent some inferior ecologists-only procedure for this. > > Krzysztof > > I was hoping somebody might have a > > suggestion for how I could combine my p values. One option, I think > would > > be to take the highest (worst) p value (in the example below, this would > be > > p = 0.012). However for large numbers of imputations, I am believe that > > this method might be to conservative. Another option might be to take > the > > p value corresponding to the median rho score (in the example below this > > would be p =0.008). Thoughts? > > -Jacob > > > > > > ##Example Code Below > > require(Amelia) > > require(vegan) > > require(ecodist) > > > > ##Species data matrix with environmental data that are missing some > values. > > data(varespec) > > data(varechem) > > varechem.missings <- varechem[,c("N", "P", "K")] > > varechem.missings[c(1,5, 7, 15, 20),1] <- NA > > varechem.missings[c(1,2, 9, 21), 2] <- NA > > > > #I multiply impute the missing values with the Amelia package > > imps <- 5 > > #imps <- 25 > > varechem.amelia <- amelia(varechem.missings, m = imps) > > > > > > #for each imputation of the environmental data I run a mantel test and > save > > #the results to mresults > > mresults <- NULL > > > > for(i in 1:imps){ > > varespec.dist <- vegdist(varespec) > > varechem.am.dist <- dist(varechem.amelia$imputations[[i]]) > > mresults <- rbind(mresults, > > (ecodist::mantel(varespec.dist~varechem.am.dist))) > > } > > > > mresults > > > > ##mantelr pval1 pval2 pval3 llim.2.5% ulim.97.5% > > ## [1,] 0.2137656 0.008 0.993 0.008 0.1015176 0.3389979 > > ## [2,] 0.2162388 0.011 0.990 0.011 0.1207528 0.3346554 > > ## [3,] 0.2149556 0.012 0.989 0.016 0.1319943 0.3279028 > > ## [4,] 0.2101820 0.009 0.992 0.012 0.1217293 0.3288272 > > ## [5,] 0.2135279 0.006 0.995 0.006 0.1130386 0.3359864 > > > > #based on these results what would be a reasonable p value to report for > > the environmental parameters relating to the community structure? > > > > ##end example > > > > [[alternative HTML version deleted]] > > > > ___ > > R-sig-ecology mailing list > > R-sig-ecology@r-project.org > > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology > > [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
[R-sig-eco] point color in NMDS (vegan)
Dear List, This is Elaine. I am using metaMDS in package vegan to plot NMDS. However, I want to draw the points using different colors according to island sites. For example: island site 1: island B, C, D => blue island site 2: island A, E, F => green island site 3: island G, H, J => red Please kindly advise how to modify the following code to classify the sites by colors. Thank you Elaine library(MASS) library(vegan) island.NMDS <- metaMDS(island,k=2, distfun = betadiver, distance = "sim",trymax=100,zerodist="add") plot(island.NMDS, type = "n") # species as symbols points(island.NMDS, display = 'species', pch = '+', cex = 0.6) # sites as text text(island.NMDS, display = 'sites') [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Re: [R-sig-eco] Mantel test with multiple imputation, P values
I think D is the easy part, it should be the mean Mantel test statistic from all the imputed data sets. The rest I'm not sure yet, but if you get a good answer from someone else, I'd appreciate it if you shared it. Krzysztof On Aug 18, 2013 4:44 PM, "Jacob Cram" wrote: > Krzystof, >Thank you for your reply. It is promising that you think that Li's > method should work for imputed mantel results. That said, I am a bit over > my head with the math in the Li et al reference. Could you (or anyone on > this list) provide an R-code example of how D would be calculated from the > output of several mantel tests? > > Also, and forgive my ignorance if this statement is coming from the wrong > direction, It is my understanding that mantel P values are generally > calculated by actually permuting the rows of the similarity matrix and > re-running the mantel test a given number of times (usually 1000). > Accordingly, as far as I can tell there is no explicitly generated F value > corresponding to a mantel p value. It seems like Li's method assumes I am > generating P from an F table. Would it be appropriate to back calculate F > from P, k and m? > > Thanks again, > Jacob > > > > On Sun, Aug 18, 2013 at 2:19 PM, Krzysztof Sakrejda < > krzysztof.sakre...@gmail.com> wrote: > >> Hi Jacob, comments below. >> >> On Aug 18, 2013 2:31 PM, "Jacob Cram" wrote: >> > >> > Dear List, >> > I have an environmental data set with several missing values that >> I >> > am trying to relate to a community structure data set using a mantel >> > test. One solution to the missing data problem seems to be multiple >> > imputation; I am using the Amelia package. This generates several (five >> in >> > this example) imputed data sets. I can run mantel on each of these and >> > come up with five similar but not identical solutions. I figure I can >> > average the mantel rho values. However, I am not sure what to do about >> the >> > P values. From looking around online, it looks like I shouldn't take the >> > average of p values. I found this reference < >> > >> http://missingdata.lshtm.ac.uk/index.php?option=com_content&view=article&id=164:combining-p-values-from-multiple-imputations&catid=57:multiple-imputation&Itemid=98 >> > >> > that seems to have promising suggestions, but I can't seem to figure out >> > how I'd implement any of these in R. >> >> So following that link and reading the Li et al. reference it looks as >> though the procedure is well described at the top of page 71. You get your >> parameter estimate from the usual procedure. The test statistic, written as >> "D", is the distance between the null value and the estimated value with >> some scaling specified in eq. 1.17. They use the F distribution and k and m >> (the number of imputations) degrees of freedom. I don't think you need to >> reinvent some inferior ecologists-only procedure for this. >> >> Krzysztof >> >> I was hoping somebody might have a >> > suggestion for how I could combine my p values. One option, I think >> would >> > be to take the highest (worst) p value (in the example below, this >> would be >> > p = 0.012). However for large numbers of imputations, I am believe that >> > this method might be to conservative. Another option might be to take >> the >> > p value corresponding to the median rho score (in the example below this >> > would be p =0.008). Thoughts? >> > -Jacob >> > >> > >> > ##Example Code Below >> > require(Amelia) >> > require(vegan) >> > require(ecodist) >> > >> > ##Species data matrix with environmental data that are missing some >> values. >> > data(varespec) >> > data(varechem) >> > varechem.missings <- varechem[,c("N", "P", "K")] >> > varechem.missings[c(1,5, 7, 15, 20),1] <- NA >> > varechem.missings[c(1,2, 9, 21), 2] <- NA >> > >> > #I multiply impute the missing values with the Amelia package >> > imps <- 5 >> > #imps <- 25 >> > varechem.amelia <- amelia(varechem.missings, m = imps) >> > >> > >> > #for each imputation of the environmental data I run a mantel test and >> save >> > #the results to mresults >> > mresults <- NULL >> > >> > for(i in 1:imps){ >> > varespec.dist <- vegdist(varespec) >> > varechem.am.dist <- dist(varechem.amelia$imputations[[i]]) >> > mresults <- rbind(mresults, >> > (ecodist::mantel(varespec.dist~varechem.am.dist))) >> > } >> > >> > mresults >> > >> > ##mantelr pval1 pval2 pval3 llim.2.5% ulim.97.5% >> > ## [1,] 0.2137656 0.008 0.993 0.008 0.1015176 0.3389979 >> > ## [2,] 0.2162388 0.011 0.990 0.011 0.1207528 0.3346554 >> > ## [3,] 0.2149556 0.012 0.989 0.016 0.1319943 0.3279028 >> > ## [4,] 0.2101820 0.009 0.992 0.012 0.1217293 0.3288272 >> > ## [5,] 0.2135279 0.006 0.995 0.006 0.1130386 0.3359864 >> > >> > #based on these results what would be a reasonable p value to report for >> > the environmental parameters relating to the community structure? >> > >> > ##end example >> > >> > [[alternative HTML version deleted]] >> > >> > ___
Re: [R-sig-eco] point color in NMDS (vegan)
I assume that the dataset "island" is yours so I've made that equivalent to an example dataset in vegan named "dune". Below is a rather brute force way to assign colors. library(MASS) library(vegan) data(dune) island = dune island.NMDS <- metaMDS(island,k=2, distfun = betadiver, distance = "sim", trymax=100, zerodist="add") plot(island.NMDS, type = "n") # species as symbols col = rep("black",dim(island)[2]) c = colnames(island) col[c=="Belper" | c=="Elepal" | c=="Viclat" | c=="Brarut"] = "blue" col[c=="Airpra" | c=="Rumace" | c=="Potpal" | c=="Tripra"] = "red" col[c=="Sagpro" | c=="Agrsto" | c=="Hyprad" | c=="Poapra"] = "green" points(island.NMDS, display = 'species', pch = '+', cex = 0.6, col=col) # sites as text col = rep("black",dim(island)[1]) r = rownames(island) col[r=="2" | r=="17" | r=="18"] = "blue" col[r=="3" | r=="10" | r=="20"] = "red" text(island.NMDS, display = 'sites',col=col) Jim Baldwin USDA Forest Service -Original Message- From: r-sig-ecology-boun...@r-project.org [mailto:r-sig-ecology-boun...@r-project.org] On Behalf Of Elaine Kuo Sent: Sunday, August 18, 2013 4:02 PM To: Subject: [R-sig-eco] point color in NMDS (vegan) Dear List, This is Elaine. I am using metaMDS in package vegan to plot NMDS. However, I want to draw the points using different colors according to island sites. For example: island site 1: island B, C, D => blue island site 2: island A, E, F => green island site 3: island G, H, J => red Please kindly advise how to modify the following code to classify the sites by colors. Thank you Elaine library(MASS) library(vegan) island.NMDS <- metaMDS(island,k=2, distfun = betadiver, distance = "sim",trymax=100,zerodist="add") plot(island.NMDS, type = "n") # species as symbols points(island.NMDS, display = 'species', pch = '+', cex = 0.6) # sites as text text(island.NMDS, display = 'sites') [[alternative HTML version deleted]] ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately. ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology