BEP wrote: > Hello all, > > I am working with a very large data set into R, and I have no interest in > reviving my SAS skills. To do this, I will need to drop unwanted variables > given the size of the data file. The most common strategy seems to be > subsetting the data after it is read into R. Unfortunately, given the size > of the data set, I can't get the file read and then subsquently do the > subset procedure. I would be appreciative of help on the following: > > 1. What are the possibilities of reading in just a small set of variables > during the <read.table> statement (or another 'read' statement)? That is, > is it possible specify just the variables that I want to keep? > > 2. Can I randomly select a set of observations during the 'read' statement? > > > I have searched various R resources for this information, so if I am simply > overlooking a key resource on this issue, pointing that out to me would be > greatly appreciated. > > Thanks in advance. > > Brian
Check this for input of specific columns from a large data matrix: mysubsetdata<-do.call("cbind",scan(file='location and name of your file',what=list(NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,0,NULL,0,NULL,NULL),flush=TRUE)) This will input only columns 10 and 11 into 'mysubsetdata'. With this method you can work out the way to select random columns. HTH Rubén ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.