On May 23, 2010, at 10:00 AM, Kang Min wrote:

Hi,

I have a dataset that looks like the one below.

data
plot     plantno.    species
H          31             ABC
D          2               DEF
Y          54             GFE
E          12             ERF
Y          98             FVD
H          4               JKU
J           7               JFG
A          55             EGD
.            .                 .
.            .                 .
.            .                 .

I want to select rows belonging to 7 random plots for 100 times.

So you should be thinking about a function that will do what you want exactly once and then wrapping it in replicate().


(There are 50 plots in total)
So I created a list of 100 vectors, each vector has 7 elements.

samp <- lapply(1:100, function(i) sample(LETTERS))

Please. "Minimal"!!!   5 samples should be enough for testing.

samp2 <- lapply(samp2, "[", 1:7)

How can I select the 26 plots from 'data' using 'samp'?

samp3 <- sample(LETTERS, 7)

You do not want to sample from LETTERS but rather from the vector of data named "plot". Otherwise you will not be creating a representative sample. And ... "plot" is a really crappy name for a column. Try to avoid naming your columns with names that are common functions. Confusion of the humans reading your code is the predictable result, and occasional "confusion" of the R interpreter also may occur.

[After reading your reply to Holtman.... Or maybe you do want to sample from LETTERS. The fix would be obvious.]

samp4 <- subset(data, plot %in% samp3) # this works

So this is what you want to do once:

samp1 <- function() subset(data, plot %in% sample(data$plot, 7) )

samp15 <- replicate(10, samp1())

samp5[,1] will be one sampled subset. (samp10 is now an array of lists.)

Unforfunately, I noticed that even with minimal "data" example you provided (not in reproducible form unfortunately) that I was getting 7 or 8 samples and realized that using letters to subset was creating some overlaps whenever "H" was sampled. So this is safer:

samp1 <- function() data[ sample(1:nrow(data), 7 ),]
samp5 <- replicate(5, samp1() )
for(1 in 1:5) print(samp5[,i])

Then I noticed your reply to Holtman, so perhaps you do really wnat the first solution. Just so you understand it might not be statistically correct.

--
David.



samp5 <- subset(data, plot %in% samp2[[1]]) # this works as well, but
I used a for loop to get it to select 7 plots 100 times.

for (i in nrow(samp2)) {
     samp6 <- subset(data, plot %in% samp2[[i]])
} # this doesn't work

Am I missing something, or is there a better solution?

Thanks.
Kang Min

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to