On Mon, Dec 21, 2009 at 4:12 PM, Adam Carr <adamlc...@yahoo.com> wrote: > Good Morning: > > I've read many, many posts on the r-help system and I feel compelled to > quickly admit that I am relatively new to R, I do have several reference > books around me, but I cannot count myself among the fortunate who seem to > strong programming intuition. > > I have a data set consisting of 1637 observations of five variables: tensile > strength, yield strength, elongation, hardness and a character indicator with > three levels: (Y)es, (N)o, and (F)ail. > > My objective is to randomly sample various subsets from this data set and > then evaluate these subsets using simple parameters among them tests for > normality, shape and skewness. The data set is ordered by the character > variable prior to sampling, and the samples are weighted to mirror > representation in an overall, physical process. > > I am sampling the data set using this code: > > sample <- dataset[sample(1:1637, 500, > prob=c(rep(163.7/1637,513),rep(245.5/1637,197),rep(1227.8/1637,927)),replace > = TRUE),] > > What I would like to do is iterate this process to create many (say 500 or > more) sampled sets of n=500 and then evaluate each set for the parameters of > interest. I would actually be evaluating each variable within each subset for > my characteristic of interest. I am familiar with sampling and saving single > columns of data to do this sort of thing, but I am not sure how to accomplish > this with a multiple-variable data set. > > For example, I am currently iterating this using a clunky process: > > mysamples<-list() > for (i in 1:10){ > mysamples[[i]] <- dataset[ > sample(1:1637,100,prob=c(rep(163.7/1637,513),rep(245.5/1637,197),rep(1227.8/1637,927)),replace > = TRUE), ] > } > > But this leaves me with the additional task of defining each mysample[i] > iteration and converting it to a form on which I can apply a standard > statistical test like mean() or skewness() to the variable columns within > each subset. I have attempted to iteratively convert these lists using this > code: > > mat<-matrix(nrow=100,ncol=5) > for (i in 1:length(mysamples)) > {mat[i]<-do.call('rbind',mysamples[i])} > > but running the code generates the error message: number of items to replace > is not a multiple of replacement length. I have tried unsuccessfully, by > reading many, many helpful r-help emails on this error, to understand my > probably obvious mistake. > > Based on the small amount that I think I know about R it seems to me that > sampling the data frame and containing the samples in a list is likely a > pretty inefficient way to do this task. Any help that any of you could > provide to assist me in iteratively sampling the data frame, and storing the > samples in a form on which I can apply other statistical tests would be > greatly appreciated. > > Thank you very much for taking the time to consider my questions. > > Adam > > > > [[alternative HTML version deleted]]
That's pretty much how I tend to do those things. what you seem to be missing is the ?apply family: mysamples.means<-lapply(mysamples,function(x)mean(x[,1])) Hope that gets you on your way. If you want more help, I'd suggest including an example data set in your follow-up messages. /Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.