On 2010-04-16 16:21, Sharpie wrote:
Josh B-3 wrote:
Hi,
I turn to you, the R Sages, once again for help. You've never let me down!
(1) Please make the following toy files:
x<- read.table(textConnection("var.1 var.2 var.3 var.1000
indv.1 1 5 9 7
indv.210000 2 9 3 8"), header = TRUE)
y<- read.table(textConnection("var.3 var.1000"), header = TRUE)
write.csv(x, file = "x.csv")
write.csv(y, file = "y.csv")
(2) Pretend you are starting with the files "x.csv" and "y.csv." They come
from another source -- an online database. Pretend that these files are
much, much, much larger. Specifically:
(a) Pretend that "x.csv" contains 1000 columns by 210,000 rows.
(b) "y.csv" contains just header titles. Pretend that there are 90
header titles in "y.csv" in total. These header titles are a subset of the
header titles in "x.csv."
(3) What I want to do is scan (or import, or whatever the appropriate word
is) only a subset of the columns from "x.csv" into an R. Specifically, I
only want to scan the columns of data from "x.csv" into R that are
indicated in the file "y.csv." I still want to scan in all 210000 rows
from "x.csv," but only for the aforementioned columns listed in "y.csv."
Can you guys recommend a strategy for me? I think I need to use the scan
command, based on the hugeness of "x.csv," but I don't know what exactly
to do. Specific code that gets the job done would be the most useful.
Thank you very much in advance!
Josh
read.csv.sql() from the sqldf package looks like it may do what you want- it
allows you to filter what gets read in from a CSV file using SQL statements,
something like:
SELECT list,of,column,names FROM file
Hope this helps!
-Charlie
That's probably the best way. A crude way might be to
read in one row from each file (using nrow = 1), then
use the names to define a colClasses vector whose
elements are NA for columns to be read and "NULL" for
columns to be skipped, and then read x.csv with that
colClasses vector.
I have no idea how slow this would be.
-Peter Ehlers
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.