On Thu, Jan 27, 2011 at 11:30:37AM +0100, Serena Corezzola wrote: > Hello everybody! > > > > I?m trying to define the optimal number of surveys to detect the highest > number of species within a monitoring season/session. > > To do this I want to run all the possible combinations between a set of > samples and to calculate the total number of species for each combination of > 2, 3, 4 ?n samples events, so that at the end I will be able to define which > is the lowest number of samples that I need to obtain the best result. > > > > I?ve already done this operation manually, just to see if it works, but the > point is that some of my datasets have more than 30 samples and more than 35 > species, so that the number of combinations will be HUGE! > > So here is the question: I need to find a way for R to make all possible > combinations of samples automatically, and then to automatically return the > total number of species in every combination. > > I?ve tried to search for a loop script, or something like that. However, I?m > relatively new to R and I don?t know what I need to do? Can anyone help me? > > > > Here I?ve written a simple example of the operations I need to do, just to > make my problem clearer. > > > > My dataset (matrix) has sample events by rows (U1,U2,U3) and detected > species by columns. > > > > U<-read.table("C:\\Documents > \\tre_usc.txt",header=T,row.names=1,sep="\t",dec = ",")
Hello: For simplicity of preparing a reply, let me include your data as an R command. U <- structure(list(Aadi = c(0L, 0L, 0L), Aagl = c(0L, 0L, 0L), Apap = c(0L, 0L, 0L), Aage = c(0L, 0L, 0L), Bdia = c(7L, 4L, 0L), Beup = c(0L, 2L, 0L), Crub = c(5L, 1L, 0L), Carc = c(0L, 0L, 0L), Cpam = c(1L, 0L, 14L)), .Names = c("Aadi", "Aagl", "Apap", "Aage", "Bdia", "Beup", "Crub", "Carc", "Cpam"), class = "data.frame", row.names = c("U1", "U2", "U3")) Aadi Aagl Apap Aage Bdia Beup Crub Carc Cpam U1 0 0 0 0 7 0 5 0 1 U2 0 0 0 0 4 2 1 0 0 U3 0 0 0 0 0 0 0 0 14 > First, I?ve created from this matrix all the subsets based on single > samples, > > > > U1 <- U [c(1), ] > > U2 <- U [c(2), ] > > U3 <- U [c(3), ] > [...] > > then I?ve combined them summing each time the values of the chosen lines > (total n? of combination = 4). > > > > U12<-U1+U2 > > U13<-U1+U3 > > U23<-U2+U3 > > U123<-U1+U2+U3 > [...] > > > Then I?ve applied the command ?length? to find the number of species for > every new combination. > > > > length(U12[U12>0]) > > [1] 4 > > > > length(U13[U13>0]) > > [1] 3 > This can be partially automatized as follows UM <- as.matrix(U) A <- rbind( c(1, 0, 0), c(0, 1, 0), c(0, 0, 1), c(1, 1, 0), c(1, 0, 1), c(0, 1, 1), c(1, 1, 1)) rownam <- rep("U", times=nrow(A)) for (i in 1:3) { rownam[A[, i] == 1] <- paste(rownam[A[, i] == 1], i, sep="") } dimnames(A) <- list(rownam, NULL) C <- A %*% UM C Aadi Aagl Apap Aage Bdia Beup Crub Carc Cpam U1 0 0 0 0 7 0 5 0 1 U2 0 0 0 0 4 2 1 0 0 U3 0 0 0 0 0 0 0 0 14 U12 0 0 0 0 11 2 6 0 1 U13 0 0 0 0 7 0 5 0 15 U23 0 0 0 0 4 2 1 0 14 U123 0 0 0 0 11 2 6 0 15 rowSums(C != 0) U1 U2 U3 U12 U13 U23 U123 3 3 1 4 3 4 4 > Now I need to do this with 10 and 32 sample events??.: ( If i understand you correctly, your real table U has 32 rows and you want to consider all subsets of at most 10 rows. If this is so, then the number of combinations is sum(choose(32, 1:10)) # [1] 107594212 A matrix of this number of rows and 35 columns requires 30 GB of memory. How do you want to summarize the results? There may be a more efficient way to compute the required parameters. For example, the average number of species, which are contained in a sum of a random selection of k rows may be computed easily, since we can consider the columns (species) individually and for each column, the probability to get a nonzero sum may be computed without actually constructing all the subsets. If you need a parameter, which is harder to compute than the average, it is possible to consider simulation. In this case, not all subsets would be generated, but a smaller number of randomly chosen subsets of k rows for a given k. Petr Savicky. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.