On Wed, Feb 1, 2012 at 11:57 PM, HC <hca...@yahoo.co.in> wrote: > Hi All, > > I have a very (very) large tab-delimited text file without headers. There > are only 8 columns and millions of rows. I want to make numerous pieces of > this file by sub-setting it for individual stations. Station is given as in > the first column. I am trying to learn and use sqldf package for this but am > stuck in a couple of places. > > To simulate my requirement, I have taken iris dataset as an example and have > done the following: > (1) create a tab-delimited file without headers. > (2) read it using read.csv.sql command > (3) write the result of a query, getting first 10 records > > Here is the reproducible code that I am trying: > # Text data file > write.table(iris, "irisNoH.txt", sep = "\t", quote = FALSE, > col.names=FALSE,row.names = FALSE) > # create an empty database (can skip this step if database already exists) > sqldf("attach myTestdbT as new") > f1<-file("irisNoH.txt") > attr(f1, "file.format") <- list(header=FALSE,sep="\t") > # read into table called irisTab in the mytestdb sqlite database > read.csv.sql("irisNoH.txt", sql = "create table main.irisTab1 as select * > from file", dbname = "mytestdb") > res1<-sqldf("select * from main.irisTab1 limit 10", dbname = "mytestdb") > write.table(res1, "iris10.txt", sep = "\t", quote = FALSE, > col.names=FALSE,row.names = FALSE) > > # For querying records of a particular species - unresolved problems > #a1<-"virginica" > #attr(f1, "names") <- c("A1","A2","A3","A4","A5") > #res2<-fn$sqldf("select * from main.irisTab1 where A5 = '$a1'") > > In the above, I am not able to: > (1) assign the names to various columns > (2) query for particular value of a column; in this case for particular > species, say virginica > (3) I guess fn$sqldf can do the job but it requires assigning column names > > Any help would be most appreciated. >
Ignoring your iris file for a moment, to query the 5th column (getting its name via sql rather than via R) we can do this: library(sqldf) species <- "virginica" nms <- names(dbGetQuery(con, "select * from iris limit 0")) fn$dbGetQuery(con, "select * from iris where `nms[5]` = '$species' limit 3") Now, sqldf is best used when you are getting the data from R but if you want to store it in a database and just leave it there then you might be better off using RSQLite directly like this (the eol = "\r\n" in the dbWriteTable statement was needed on my Windows system but you may not need that depending on your platform): write.table(iris, "irisNoH.txt", sep = "\t", quote = FALSE, col.names = FALSE, row.names = FALSE) library(sqldf) library(RSQLite) con <- dbConnect(SQLite(), dbname = "mytestdb") dbWriteTable(con, "iris", "irisNoH.txt", sep = "\t", eol = "\r\n") species <- "virginica" nms <- names(dbGetQuery(con, "select * from iris limit 0")) fn$dbGetQuery(con, "select * from iris where `nms[5]` = '$species' limit 3") dbDisconnect(con) -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.