On Thu, Feb 2, 2012 at 3:11 AM, Gabor Grothendieck <ggrothendi...@gmail.com> wrote: > On Wed, Feb 1, 2012 at 11:57 PM, HC <hca...@yahoo.co.in> wrote: >> Hi All, >> >> I have a very (very) large tab-delimited text file without headers. There >> are only 8 columns and millions of rows. I want to make numerous pieces of >> this file by sub-setting it for individual stations. Station is given as in >> the first column. I am trying to learn and use sqldf package for this but am >> stuck in a couple of places. >> >> To simulate my requirement, I have taken iris dataset as an example and have >> done the following: >> (1) create a tab-delimited file without headers. >> (2) read it using read.csv.sql command >> (3) write the result of a query, getting first 10 records >> >> Here is the reproducible code that I am trying: >> # Text data file >> write.table(iris, "irisNoH.txt", sep = "\t", quote = FALSE, >> col.names=FALSE,row.names = FALSE) >> # create an empty database (can skip this step if database already exists) >> sqldf("attach myTestdbT as new") >> f1<-file("irisNoH.txt") >> attr(f1, "file.format") <- list(header=FALSE,sep="\t") >> # read into table called irisTab in the mytestdb sqlite database >> read.csv.sql("irisNoH.txt", sql = "create table main.irisTab1 as select * >> from file", dbname = "mytestdb") >> res1<-sqldf("select * from main.irisTab1 limit 10", dbname = "mytestdb") >> write.table(res1, "iris10.txt", sep = "\t", quote = FALSE, >> col.names=FALSE,row.names = FALSE) >> >> # For querying records of a particular species - unresolved problems >> #a1<-"virginica" >> #attr(f1, "names") <- c("A1","A2","A3","A4","A5") >> #res2<-fn$sqldf("select * from main.irisTab1 where A5 = '$a1'") >> >> In the above, I am not able to: >> (1) assign the names to various columns >> (2) query for particular value of a column; in this case for particular >> species, say virginica >> (3) I guess fn$sqldf can do the job but it requires assigning column names >> >> Any help would be most appreciated. >> > > Ignoring your iris file for a moment, to query the 5th column (getting > its name via sql rather than via R) we can do this: > > library(sqldf) > species <- "virginica" > nms <- names(dbGetQuery(con, "select * from iris limit 0")) > fn$dbGetQuery(con, "select * from iris where `nms[5]` = '$species' limit 3") > > Now, sqldf is best used when you are getting the data from R but if > you want to store it in a database and just leave it there then you > might be better off using RSQLite directly like this (the eol = "\r\n" > in the dbWriteTable statement was needed on my Windows system but you > may not need that depending on your platform): > > > write.table(iris, "irisNoH.txt", sep = "\t", quote = FALSE, col.names > = FALSE, row.names = FALSE) > > library(sqldf) > library(RSQLite) > > con <- dbConnect(SQLite(), dbname = "mytestdb") > > dbWriteTable(con, "iris", "irisNoH.txt", sep = "\t", eol = "\r\n") > > species <- "virginica" > nms <- names(dbGetQuery(con, "select * from iris limit 0")) > fn$dbGetQuery(con, "select * from iris where `nms[5]` = '$species' limit 3") > > dbDisconnect(con)
There seems to have been a pasting error here. The first part was intended to show how to do this using sqldf and the second using RSQLite. Thus the first part was intended to be: library(sqldf) species <- "virginica" # obviously we could just do nms <- names(iris) but to get # names from database instead nms <- names(dbGetQuery(con, "select * from iris limit 0")) # use 5th column fn$sqldf("select * from iris where `nms[5]` = '$species' limit 3") and the second part that illustrates RSQLite was ok. Note that fn$ comes from the gsubfn package which sqldf loads. -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.