[R] Question About Repeat Random Sampling from a Data Frame

Adam Carr Mon, 21 Dec 2009 07:13:13 -0800

Good Morning:

I've read many, many posts on the r-help system and I feel compelled to quickly 
admit that I am relatively new to R, I do have several reference books around 
me, but I cannot count myself among the fortunate who seem to strong 
programming intuition.


I have a data set consisting of 1637 observations of five variables: tensile 
strength, yield strength, elongation, hardness and a character indicator with 
three levels: (Y)es, (N)o, and (F)ail.

My objective is to randomly sample various subsets from this data set and then 
evaluate these subsets using simple parameters among them tests for normality, 
shape and skewness. The data set is ordered by the character variable prior to 
sampling, and the samples are weighted to mirror representation in an overall, 
physical process.

I am sampling the data set using this code:

sample <- dataset[sample(1:1637, 500, 
prob=c(rep(163.7/1637,513),rep(245.5/1637,197),rep(1227.8/1637,927)),replace = 
TRUE),]

What I would like to do is iterate this process to create many (say 500 or 
more) sampled sets of n=500 and then evaluate each set for the parameters of 
interest. I would actually be evaluating each variable within each subset for 
my characteristic of interest. I am familiar with sampling and saving single 
columns of data to do this sort of thing, but I am not sure how to accomplish 
this with a multiple-variable data set.

For example, I am currently iterating this using a clunky process:

mysamples<-list()
for (i in 1:10){
mysamples[[i]] <- dataset[ 
sample(1:1637,100,prob=c(rep(163.7/1637,513),rep(245.5/1637,197),rep(1227.8/1637,927)),replace
 = TRUE), ]
}

But this leaves me with the additional task of defining each mysample[i] 
iteration and converting it to a form on which I can apply a standard 
statistical test like mean() or skewness() to the variable columns within each 
subset. I have attempted to iteratively convert these lists using this code:

mat<-matrix(nrow=100,ncol=5)
for (i in 1:length(mysamples))
{mat[i]<-do.call('rbind',mysamples[i])}

but running the code generates the error message: number of items to replace is 
not a multiple of replacement length. I have tried unsuccessfully, by reading 
many, many helpful r-help emails on this error, to understand my probably 
obvious mistake. 

Based on the small amount that I think I know about R it seems to me that 
sampling the data frame and containing the samples in a list is likely a pretty 
inefficient way to do this task. Any help that any of you could provide to 
assist me in iteratively sampling the data frame, and storing the samples in a 
form on which I can apply other statistical tests would be greatly appreciated.

Thank you very much for taking the time to consider my questions.

Adam 


      
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Question About Repeat Random Sampling from a Data Frame

Reply via email to