Re: [R] Question About Repeat Random Sampling from a Data Frame

Gustaf Rydevik Mon, 21 Dec 2009 07:22:18 -0800

On Mon, Dec 21, 2009 at 4:12 PM, Adam Carr <adamlc...@yahoo.com> wrote:
> Good Morning:
>
> I've read many, many posts on the r-help system and I feel compelled to 
> quickly admit that I am relatively new to R, I do have several reference 
> books around me, but I cannot count myself among the fortunate who seem to 
> strong programming intuition.
>
> I have a data set consisting of 1637 observations of five variables: tensile 
> strength, yield strength, elongation, hardness and a character indicator with 
> three levels: (Y)es, (N)o, and (F)ail.
>
> My objective is to randomly sample various subsets from this data set and 
> then evaluate these subsets using simple parameters among them tests for 
> normality, shape and skewness. The data set is ordered by the character 
> variable prior to sampling, and the samples are weighted to mirror 
> representation in an overall, physical process.
>
> I am sampling the data set using this code:
>
> sample <- dataset[sample(1:1637, 500, 
> prob=c(rep(163.7/1637,513),rep(245.5/1637,197),rep(1227.8/1637,927)),replace 
> = TRUE),]
>
> What I would like to do is iterate this process to create many (say 500 or 
> more) sampled sets of n=500 and then evaluate each set for the parameters of 
> interest. I would actually be evaluating each variable within each subset for 
> my characteristic of interest. I am familiar with sampling and saving single 
> columns of data to do this sort of thing, but I am not sure how to accomplish 
> this with a multiple-variable data set.
>
> For example, I am currently iterating this using a clunky process:
>
> mysamples<-list()
> for (i in 1:10){
> mysamples[[i]] <- dataset[ 
> sample(1:1637,100,prob=c(rep(163.7/1637,513),rep(245.5/1637,197),rep(1227.8/1637,927)),replace
>  = TRUE), ]
> }
>
> But this leaves me with the additional task of defining each mysample[i] 
> iteration and converting it to a form on which I can apply a standard 
> statistical test like mean() or skewness() to the variable columns within 
> each subset. I have attempted to iteratively convert these lists using this 
> code:
>
> mat<-matrix(nrow=100,ncol=5)
> for (i in 1:length(mysamples))
> {mat[i]<-do.call('rbind',mysamples[i])}
>
> but running the code generates the error message: number of items to replace 
> is not a multiple of replacement length. I have tried unsuccessfully, by 
> reading many, many helpful r-help emails on this error, to understand my 
> probably obvious mistake.
>
> Based on the small amount that I think I know about R it seems to me that 
> sampling the data frame and containing the samples in a list is likely a 
> pretty inefficient way to do this task. Any help that any of you could 
> provide to assist me in iteratively sampling the data frame, and storing the 
> samples in a form on which I can apply other statistical tests would be 
> greatly appreciated.
>
> Thank you very much for taking the time to consider my questions.
>
> Adam
>
>
>
>        [[alternative HTML version deleted]]


That's pretty much how I tend to do those things. what you seem to be
missing is the ?apply family:

mysamples.means<-lapply(mysamples,function(x)mean(x[,1]))


Hope that gets you on your way. If you want more help, I'd suggest
including an example data set in your follow-up messages.

/Gustaf

-- 
Gustaf Rydevik, M.Sci.
tel: +46(0)703 051 451
address:Essingetorget 40,112 66 Stockholm, SE
skype:gustaf_rydevik

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Question About Repeat Random Sampling from a Data Frame

Reply via email to