Hi Seth, Thanks for the reply. I provide info from sessionInfo() and about schema that you ask. Please take a look.
The output from sessionInfo() is > sessionInfo() R version 2.12.2 (2011-02-25) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UT> sessionInfo() R version 2.12.2 (2011-02-25) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] plyr_1.2.1 RSQLite_0.9-2 DBI_0.2-5 filehash_2.1-1F-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] plyr_1.2.1 RSQLite_0.9-2 DBI_0.2-5 filehash_2.1-1 The .schema of table annual_data3 is sqlite> .schema annual_data3 CREATE TABLE "annual_data3"( PERMNO INT, DATE INT, CUSIP TEXT, EXCHCD INT, SICCD INT, SHROUT INT, PRC REAL, RET REAL, ... pret_var, pRET_sd, nmret, pya_var, pya_sd, nya, pya_var_ebi, pya_sd_ebi, pya_var_ebit, pya_sd_ebit, pya_var_ebitda, pya_sd_ebitda, logage REAL, logasset REAL, ... loglead1stdaret, loglead2stdaret) Table annual_data3 is created by joining table annual_data2 and ya_vol. The column pya_var is initially in ya_vol. dbGetQuery(sql.industry, "create table annual_data3 as select a.*, b.pya_var, b.pya_sd, b.nya, b.pya_var_ebi, b.pya_sd_ebi, b.pya_var_ebit, b.pya_sd_ebit, b.pya_var_ebitda, b.pya_sd_ebitda from annual_data2 as a left join ya_vol as b on a.permno = b.permno and a.year = b.year order by permno, year") Table ya_vol is created by dbGetQuery(sql.industry, "create table ya_vol as select PERMNO, year, variance(ya) as pya_var, stdev(ya) as pya_sd, count(*) as nya, variance(ya_ebi) as pya_var_ebi, stdev(ya_ebi) as pya_sd_ebi, variance(ya_ebit) as pya_var_ebit, stdev(ya_ebit) as pya_sd_ebit, variance(ya_ebitda) as pya_var_ebitda, stdev(ya_ebitda) as pya_sd_ebitda from past_ya where ya is not null group by PERMNO, year order by PERMNO, year") The schema info of ya_vol is sqlite> .schema ya_vol CREATE TABLE ya_vol( PERMNO INT, year INT, pya_var, pya_sd, nya, pya_var_ebi, pya_sd_ebi, pya_var_ebit, pya_sd_ebit, pya_var_ebitda, pya_sd_ebitda ); CREATE INDEX ya_vol_permno_year_idx on ya_vol (permno,year); Interestingly, I find that the problem I reported does not for columns labeled real in the schema info. For example, the type of column RET never changes no matter what the first observation is. > str(dbGetQuery(sql.industry, + "select RET from annual_data3 + where RET is not null limit 5")) 'data.frame': 5 obs. of 1 variable: $ RET: num -0.03354 -0.02113 0.03797 0.0013 -0.00678 > > str(dbGetQuery(sql.industry, + "select RET from annual_data3 + where RET is null limit 5")) 'data.frame': 5 obs. of 1 variable: $ RET: num NA NA NA NA NA > sapply(dbGetQuery(sql.industry, + "select RET from annual_data3 + where RET is null limit 5"), + typeof) RET "double" > sapply(dbGetQuery(sql.industry, + "select RET from annual_data3 + where RET is not null limit 5"), + typeof) RET "double" I still don't know how to solve this problem for variable pya_var, please help. Thanks. Best, Jia On Mon, Feb 28, 2011 at 6:48 PM, Seth Falcon <s...@userprimary.net> wrote: > Hi Jia, > > On Mon, Feb 28, 2011 at 12:37 PM, chen jia <chen_1...@fisher.osu.edu> wrote: >> When I extract data from SQLite to R, the data types (or modes) of the >> extracted data seems to be determined by the value of the first row. >> Please see the following example. > > It would help to provide the output of sessionInfo() as well as the > schema definition for the table in SQLite (or at least description of > how it was created). > > Here's an example that works as you'd like: > > > library(RSQLite) > > db = dbConnect(SQLite(), dbname = ":memory:") > > dbGetQuery(db, "create table t (a int, b real, c text)") > > df = data.frame(a=c(NA, 1L, 2L), b=c(NA, 1.1, 2.2), c=c(NA, "x", > "y"),stringsAsFactors=FALSE) > > df > a b c > 1 NA NA <NA> > 2 1 1.1 x > 3 2 2.2 y > > dbGetPreparedQuery(db, "insert into t values (?, ?, ?)", df) > > dbGetQuery(db, "select * from t") > a b c > 1 NA NA <NA> > 2 1 1.1 x > 3 2 2.2 y > > sapply(dbGetQuery(db, "select * from t"), typeof) > a b c > "integer" "double" "character" > > sapply(dbGetQuery(db, "select * from t limit 1"), typeof) > a b c > "integer" "double" "character" > > sapply(dbGetQuery(db, "select a from t limit 1"), typeof) > a > "integer" > > sapply(dbGetQuery(db, "select a from t limit 2"), typeof) > a > "integer" > > sapply(dbGetQuery(db, "select a from t limit 1"), typeof) > a > "integer" > > >> sessionInfo() > R version 2.11.1 (2010-05-31) > x86_64-apple-darwin9.8.0 > > locale: > [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices datasets utils methods base > > other attached packages: > [1] RSQLite_0.9-4 DBI_0.2-5 > > loaded via a namespace (and not attached): > [1] tools_2.11.1 > > > > > -- > Seth Falcon | @sfalcon | http://userprimary.net/ > -- 700 Fisher Hall 2100 Neil Ave. Columbus, Ohio 43210 http://www.fisher.osu.edu/~chen_1002/ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.