[R] How do I solve a disk I/O error with RSQLite?
Hi there, I had an error when using RSQLite. My code and the error message are the following: > dbGetQuery(sql.dispersion, +"create table monthly_data_temp as + select a.*, b.industry + from monthly_data as a left join +siccd_industry_ff49 as b + on a.siccd = b.siccd + order by permno, date") Error in sqliteExecStatement(con, statement, bind.data) : RS-DBI driver: (RS_SQLite_exec: could not execute1: disk I/O error) Calls: dbGetQuery ... dbGetQuery -> sqliteQuickSQL -> sqliteExecStatement -> .Call Execution halted I ran the same code by using sqlite3 from the command line and had no error. $ sqlite3 dispersion.db SQLite version 3.7.9 2011-11-01 00:52:41 Enter ".help" for instructions Enter SQL statements terminated with a ";" sqlite> create table monthly_data_temp as ...> select a.*, b.industry ...> from monthly_data as a left join ...>siccd_industry_ff49 as b ...> on a.siccd = b.siccd ...> order by permno, date; sqlite> I initialized the connection for SQLite by using assign("sql.dispersion", dbConnect(SQLite(), loadable.extensions=TRUE, synchronous=0, cache_size=20, dbname="/home/chenj/Fun/Research/Projects/Dispersion/Data/dispersion.db"), envir = .GlobalEnv) and I also loaded an extension: dbGetQuery(sql.dispersion, "select load_extension('/home/chenj/Fun/SQLite/Extension/LibraryFile64Bit/libsqlitefunctions.so')") cat("Connected to sql.dispersion",fill=TRUE) The directory content for the database file is: $ ls -l total 34435604 -rw-r--r-- 1 chenj chenj 216 Jun 22 2012 C:\nppdf32Log\debuglog.txt -rw-r--r-- 1 chenj chenj 26947 Mar 22 2011 Data_Dispersion.ods drwxr-xr-x 2 chenj chenj4096 Dec 2 2011 Data_from_Hou drwxr-xr-x 2 chenj chenj 139264 Feb 2 23:30 dispersion -rw-r--r-- 1 chenj chenj 33468631040 Feb 12 10:55 dispersion.db -rw-r--r-- 1 chenj chenj 785633114 Nov 22 2009 monthly_data.csv -rwxr-xr-x 1 chenj chenj 1007551488 Nov 2 2007 monthly_data.sas7bdat If you could help me find out the cause of this error and solve it, it would be great! Please let me know if you need more information! Best, Jia __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How do I test against a simple null that two regressions coefficients are equal?
Hi there, I run two regressions: y = a1 + b1 * x + e1 y = a2 + b2 * z + e2 I want to test against the null hypothesis: b1 = b2. How do I design the test? I think I can add two equations together and divide both sides by 2: y = 0.5*(a1+a2) + 0.5*b1 * x + 0.5*b2 * z + e3, where e3 = 0.5*(e1 + e2). or just y = a3 + 0.5*b1 * x + 0.5*b2 * z + e3 If I run this new regression, I can test against the null b1 = b2 in this regression. Is it an equivalent test as the original one? If yes, how do I do that in R? Alternatively, I think I can just test against the null: correlation(y, x) = correlation(y, z), where correlation(. , .) is the correlation between two random variables. Is this equivalent too? If yes, how do I do it in R? Thanks. Best, Jia -- Ohio State University - Finance 248 Fisher Hall 2100 Neil Ave. Columbus, Ohio 43210 Telephone: 614-292-2830 http://www.fisher.osu.edu/~chen_1002/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How do I test against a simple null that two regressions coefficients are equal?
Thanks, Chuck. I am reading the references, which are helpful. Just to understand what I have done wrong here, I proposed an alternative testing strategy: I run regressions y = a3 + b1 * x + b2 * z + e3 and test alternative hypothesis b1 != b2 against the null hypothesis b1 = b2 in this equation. Is it this the same test as y = a1 + b1*x + e1 y = a2 + b2*x + e2 test alternative hypothesis b1 != b2 against null hypothesis b1 = b2. Best, Jia On Wed, Jul 7, 2010 at 11:12 PM, Charles C. Berry wrote: > On Wed, 7 Jul 2010, chen jia wrote: > >> Hi there, >> >> I run two regressions: >> >> y = a1 + b1 * x + e1 >> y = a2 + b2 * z + e2 >> >> I want to test against the null hypothesis: b1 = b2. How do I design the >> test? >> > > You are testing a non-nested hypothesis, which requires special handling. > > The classical test is due to Hotelling, but see the references (and R code > snippets) in this posting: > > http://markmail.org/message/egnowmdzpzjtahy7 > > (it is the merest coincidence that the above thread was initiated by Mark > Leeds and that the URL is 'markmail' :-) ) > > HTH, > > Chuck > > >> I think I can add two equations together and divide both sides by 2: >> y = 0.5*(a1+a2) + 0.5*b1 * x + 0.5*b2 * z + e3, where e3 = 0.5*(e1 + e2). >> or just y = a3 + 0.5*b1 * x + 0.5*b2 * z + e3 >> >> If I run this new regression, I can test against the null b1 = b2 in >> this regression. Is it an equivalent test as the original one? If >> yes, how do I do that in R? >> >> Alternatively, I think I can just test against the null: >> correlation(y, x) = correlation(y, z), where correlation(. , .) is the >> correlation between two random variables. Is this equivalent too? If >> yes, how do I do it in R? >> >> Thanks. >> >> Best, >> Jia >> >> -- >> Ohio State University - Finance >> 248 Fisher Hall >> 2100 Neil Ave. >> Columbus, Ohio 43210 >> Telephone: 614-292-2830 >> http://www.fisher.osu.edu/~chen_1002/ >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > Charles C. Berry (858) 534-2098 > Dept of Family/Preventive > Medicine > E mailto:cbe...@tajo.ucsd.edu UC San Diego > http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 > > > -- Ohio State University - Finance 248 Fisher Hall 2100 Neil Ave. Columbus, Ohio 43210 Telephone: 614-292-2830 http://www.fisher.osu.edu/~chen_1002/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How do I test against a simple null that two regressions coefficients are equal?
OK. Thanks again. I will read the references more. Best, Jia On Thu, Jul 8, 2010 at 10:51 AM, wrote: > hi: no. it's not the same. if you read the paper that I referenced last > night, that explains how to do the following test : > > Ho: R2 = R1 > > H1: R2 != R1 > > that's a different test from what you did but i think it's what you want. > > > > > On Jul 8, 2010, chen jia wrote: > > Thanks, Chuck. I am reading the references, which are helpful. > > Just to understand what I have done wrong here, > > I proposed an alternative testing strategy: > I run regressions y = a3 + b1 * x + b2 * z + e3 and test alternative > hypothesis b1 != b2 against the null hypothesis b1 = b2 in this > equation. > > Is it this the same test as > > y = a1 + b1*x + e1 > y = a2 + b2*x + e2 > test alternative hypothesis b1 != b2 against null hypothesis b1 = b2. > > Best, > Jia > > On Wed, Jul 7, 2010 at 11:12 PM, Charles C. Berry > wrote: >> On Wed, 7 Jul 2010, chen jia wrote: >> >>> Hi there, >>> >>> I run two regressions: >>> >>> y = a1 + b1 * x + e1 >>> y = a2 + b2 * z + e2 >>> >>> I want to test against the null hypothesis: b1 = b2. How do I design the >>> test? >>> >> >> You are testing a non-nested hypothesis, which requires special handling. >> >> The classical test is due to Hotelling, but see the references (and R code >> snippets) in this posting: >> >> http://markmail.org/message/egnowmdzpzjtahy7 >> >> (it is the merest coincidence that the above thread was initiated by Mark >> Leeds and that the URL is 'markmail' :-) ) >> >> HTH, >> >> Chuck >> >> >>> I think I can add two equations together and divide both sides by 2: >>> y = 0.5*(a1+a2) + 0.5*b1 * x + 0.5*b2 * z + e3, where e3 = 0.5*(e1 + e2). >>> or just y = a3 + 0.5*b1 * x + 0.5*b2 * z + e3 >>> >>> If I run this new regression, I can test against the null b1 = b2 in >>> this regression. Is it an equivalent test as the original one? If >>> yes, how do I do that in R? >>> >>> Alternatively, I think I can just test against the null: >>> correlation(y, x) = correlation(y, z), where correlation(. , .) is the >>> correlation between two random variables. Is this equivalent too? If >>> yes, how do I do it in R? >>> >>> Thanks. >>> >>> Best, >>> Jia >>> >>> -- >>> Ohio State University - Finance >>> 248 Fisher Hall >>> 2100 Neil Ave. >>> Columbus, Ohio 43210 >>> Telephone: 614-292-2830 >>> http://www.fisher.osu.edu/~chen_1002/ >>> >>> __ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> Charles C. Berry (858) 534-2098 >> Dept of Family/Preventive >> Medicine >> E mailto:cbe...@tajo.ucsd.edu UC San Diego >> http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 >> >> >> > > > > -- > Ohio State University - Finance > 248 Fisher Hall > 2100 Neil Ave. > Columbus, Ohio 43210 > Telephone: 614-292-2830 > http://www.fisher.osu.edu/~chen_1002/ > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Ohio State University - Finance 248 Fisher Hall 2100 Neil Ave. Columbus, Ohio 43210 Telephone: 614-292-2830 http://www.fisher.osu.edu/~chen_1002/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How do I test against a simple null that tworegressions coefficients are equal?
Bert, Thanks for replying. Actually, Chuck and Mark's suggestions already helped, especially regarding to the problem you call type III error. Best, Jia On Thu, Jul 8, 2010 at 11:06 AM, Bert Gunter wrote: > Inline below. > > Bert Gunter > Genentech Nonclinical Statistics > >> -Original Message- >> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] >> On Behalf Of markle...@verizon.net >> Sent: Thursday, July 08, 2010 7:52 AM >> To: chen_1...@fisher.osu.edu >> Cc: r-help@r-project.org; cbe...@tajo.ucsd.edu >> Subject: Re: [R] How do I test against a simple null that tworegressions >> coefficients are equal? >> >> >> hi: no. it's not the same. if you read the paper that I referenced last >> night, that explains how to do the following test : >> Ho: R2 = R1 >> H1: R2 != R1 >> that's a different test from what you did but i think it's what you >> want. >> >> On Jul 8, 2010, chen jia wrote: >> >> Thanks, Chuck. I am reading the references, which are helpful. >> Just to understand what I have done wrong here, >> I proposed an alternative testing strategy: >> I run regressions y = a3 + b1 * x + b2 * z + e3 and test alternative >> hypothesis b1 != b2 against the null hypothesis b1 = b2 in this >> equation. > > If this is what you want (see below) try this: > > w <- x + z > > anova(lm(y ~ w), lm(y ~ x + w)) > > > However, as you seem to be somewhat at sea in your basic statistical > knowledge, I suggest you consult a local statistician to make sure whether > you are asking sensible questions in the first place. You appear to be at > high risk for type III error (right answer to wrong question). > > -- Bert > > > > >> Is it this the same test as >> y = a1 + b1*x + e1 >> y = a2 + b2*x + e2 >> test alternative hypothesis b1 != b2 against null hypothesis b1 = b2. >> Best, >> Jia >> On Wed, Jul 7, 2010 at 11:12 PM, Charles C. Berry >> >> wrote: >> > On Wed, 7 Jul 2010, chen jia wrote: >> > >> >> Hi there, >> >> >> >> I run two regressions: >> >> >> >> y = a1 + b1 * x + e1 >> >> y = a2 + b2 * z + e2 >> >> >> >> I want to test against the null hypothesis: b1 = b2.  How do I >> design >> the >> >> test? >> >> >> > >> > You are testing a non-nested hypothesis, which requires special >> handling. >> > >> > The classical test is due to Hotelling, but see the references (and >> R >> code >> > snippets) in this posting: >> > >> >     http://markmail.org/message/egnowmdzpzjtahy7 >> > >> > (it is the merest coincidence that the above thread was initiated >> by >> Mark >> > Leeds and that the URL is 'markmail' :-) ) >> > >> > HTH, >> > >> > Chuck >> > >> > >> >> I think I can add two equations together and divide both sides by >> 2: >> >> y = 0.5*(a1+a2) + 0.5*b1 * x + 0.5*b2 * z + e3, where e3 = 0.5*(e1 >> + >> e2). >> >> or just y = a3 + 0.5*b1 * x + 0.5*b2 * z + e3 >> >> >> >> If I run this new regression, I can test against the null b1 = b2 >> in >> >> this regression.  Is it an equivalent test as the original one? >> If >> >> yes, how do I do that in R? >> >> >> >> Alternatively, I think I can just test against the null: >> >> correlation(y, x) = correlation(y, z), where correlation(. , .) is >> the >> >> correlation between two random variables. Is this equivalent too? >> If >> >> yes, how do I do it in R? >> >> >> >> Thanks. >> >> >> >> Best, >> >> Jia >> >> >> >> -- >> >>             Ohio State University - Finance >> >>                  248 Fisher Hall >> >>                  2100 Neil Ave. >> >>               Columbus, Ohio  43210 >> >> Â
[R] How do I move axis labels closer to plot box?
Hi there, I place a vector of strings as labels at the tick points by using axis(1,at=seq(0.1,0.7,by=0.1), labels=paste(seq(10,70,by=10),"%",sep=""), tick=FALSE) However, there is a large space between those labels and the boundary of plot box. I want to reduce this space so that the labels appear just next to the boundary of the plot box. How do I do that? Thanks. Best, Jia -- Ohio State University - Finance 248 Fisher Hall 2100 Neil Ave. Columbus, Ohio 43210 Telephone: 614-292-2830 http://www.fisher.osu.edu/~chen_1002/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Regressions with fixed-effect in R
Hi there, Maybe people who know both R and econometrics will be able to answer my questions. I want to run panel regressions in R with fixed-effect. I know two ways to do it. First, I can include factor(grouping_variable) in my regression equation. Second, I plan to subtract group mean from my variables and run OLS panel regression with function lm(). I plan to do it with the second way because the number of groups is large, which incur computational problems inverting large model-matrix. I am interested in the R-squared and adjusted R-squared out of these regressions. Do I need to adjust my R-squared after I run OLS regressions with demeaned variables? Are there any functions that specifically deal with fixed-effects? Thanks. Best, Jia __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Pairwise combination
Hi there, I am looking for a function that takes a vector as input and generates all pair wise combination of the elements of the input vector. For example, the input vector is c(1,2,3,4). The output vector is c(1 and 2, 1 and 3, 1 and 4, 2 and 3, 2 and 4, 3 and 4). The representation here is generic. I can use a for loop to do it, but it gets slow when the number of elements of the input vector is large. Best, Jia -- Ohio State University - Finance 248 Fisher Hall 2100 Neil Ave. Columbus, Ohio 43210 Telephone: 614-292-2830 http://www.fisher.osu.edu/~chen_1002/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Regressions with fixed-effect in R
Thanks. I have the PDF document that you suggest. It is very brief on fixed-effect in panel regressions. I will look into plm package. Best, Jia On Tue, May 11, 2010 at 8:19 AM, Liviu Andronic wrote: > On 5/11/10, chen jia wrote: >> Are there any functions that specifically deal with fixed-effects? >> > Other than plm and its vignette, you may want to check this document [1]. > Liviu > > [1] http://cran.r-project.org/doc/contrib/Farnsworth-EconometricsInR.pdf > -- Ohio State University - Finance 248 Fisher Hall 2100 Neil Ave. Columbus, Ohio 43210 Telephone: 614-292-2830 http://www.fisher.osu.edu/~chen_1002/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Pairwise combination
Thanks, everyone. combn function works pretty well for me. Best, Jia On Tue, May 11, 2010 at 5:12 PM, Henrique Dallazuanna wrote: > Or: > > combn(1:4, 2, paste, collapse = ' and ') > > On Tue, May 11, 2010 at 5:54 PM, Ista Zahn wrote: >> >> On Tuesday 11 May 2010 4:27:01 pm David Winsemius wrote: >> > On May 11, 2010, at 1:45 PM, chen jia wrote: >> > > Hi there, >> > > >> > > I am looking for a function that takes a vector as input and generates >> > > all pair wise combination of the elements of the input vector. >> > >> > ?combn >> > >> > > For example, the input vector is c(1,2,3,4). The output vector is c(1 >> > > and 2, 1 and 3, 1 and 4, 2 and 3, 2 and 4, 3 and 4). >> > >> > That's not a vector. >> >> True, although it is possible the OP wanted paste(combn(1:4, 2)[1,], >> "and", >> combn(1:4, 2)[2,]) >> >> -Ista >> >> > >> > > The >> > > representation here is generic. >> > > >> > > I can use a for loop to do it, but it gets slow when the number of >> > > elements of the input vector is large. >> > > >> > > Best, >> > > Jia >> > >> > David Winsemius, MD >> > West Hartford, CT >> > >> > __ >> > R-help@r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html and provide commented, >> > minimal, self-contained, reproducible code. >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > > -- > Henrique Dallazuanna > Curitiba-Paraná-Brasil > 25° 25' 40" S 49° 16' 22" O > -- Ohio State University - Finance 248 Fisher Hall 2100 Neil Ave. Columbus, Ohio 43210 Telephone: 614-292-2830 http://www.fisher.osu.edu/~chen_1002/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Data type problem when extract data from SQLite to R by using RSQLite
Hi there, When I extract data from SQLite to R, the data types (or modes) of the extracted data seems to be determined by the value of the first row. Please see the following example. When I put the missing values first, the column extracted is of the mode character. > str(dbGetQuery(sql.industry, +"select pya_var from annual_data3 + order by pya_var")) 'data.frame': 155465 obs. of 1 variable: $ pya_var: chr NA NA NA NA ... When I put the non-missing values first, the column extracted is of the mode numeric. > str(dbGetQuery(sql.industry, +"select pya_var from annual_data3 + order by pya_var desc")) 'data.frame': 155465 obs. of 1 variable: $ pya_var: num 2.23 2.08 2.04 2 1.86 ... And, if the missing value happens to be the first observation, the whole column is converted to the mode character. > str(dbGetQuery(sql.industry, +"select pya_var from annual_data3")) 'data.frame': 155465 obs. of 1 variable: $ pya_var: chr NA "0.0" "0.000532137017747267" "0.00655147489334259" ... This creates a problem because this column "pya_var" is supposed to be numeric. If it is converted to character, I will encounter errors in subsequent computation. Although I can convert the data back to numeric, but it will be a hassle to check whether it is necessary and which columns to convert every time I extract data from SQLite into R. Do you have any idea how I can make sure the data type stays numeric regardless what value the first observation is? Thanks. Best, Jia -- 700 Fisher Hall 2100 Neil Ave. Columbus, Ohio 43210 http://www.fisher.osu.edu/~chen_1002/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data type problem when extract data from SQLite to R by using RSQLite
Hi Seth, Thanks for the reply. I provide info from sessionInfo() and about schema that you ask. Please take a look. The output from sessionInfo() is > sessionInfo() R version 2.12.2 (2011-02-25) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UT> sessionInfo() R version 2.12.2 (2011-02-25) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] plyr_1.2.1 RSQLite_0.9-2 DBI_0.2-5 filehash_2.1-1F-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] plyr_1.2.1 RSQLite_0.9-2 DBI_0.2-5 filehash_2.1-1 The .schema of table annual_data3 is sqlite> .schema annual_data3 CREATE TABLE "annual_data3"( PERMNO INT, DATE INT, CUSIP TEXT, EXCHCD INT, SICCD INT, SHROUT INT, PRC REAL, RET REAL, ... pret_var, pRET_sd, nmret, pya_var, pya_sd, nya, pya_var_ebi, pya_sd_ebi, pya_var_ebit, pya_sd_ebit, pya_var_ebitda, pya_sd_ebitda, logage REAL, logasset REAL, ... loglead1stdaret, loglead2stdaret) Table annual_data3 is created by joining table annual_data2 and ya_vol. The column pya_var is initially in ya_vol. dbGetQuery(sql.industry, "create table annual_data3 as select a.*, b.pya_var, b.pya_sd, b.nya, b.pya_var_ebi, b.pya_sd_ebi, b.pya_var_ebit, b.pya_sd_ebit, b.pya_var_ebitda, b.pya_sd_ebitda from annual_data2 as a left join ya_vol as b on a.permno = b.permno and a.year = b.year order by permno, year") Table ya_vol is created by dbGetQuery(sql.industry, "create table ya_vol as select PERMNO, year, variance(ya) as pya_var, stdev(ya) as pya_sd, count(*) as nya, variance(ya_ebi) as pya_var_ebi, stdev(ya_ebi) as pya_sd_ebi, variance(ya_ebit) as pya_var_ebit, stdev(ya_ebit) as pya_sd_ebit, variance(ya_ebitda) as pya_var_ebitda, stdev(ya_ebitda) as pya_sd_ebitda from past_ya where ya is not null group by PERMNO, year order by PERMNO, year") The schema info of ya_vol is sqlite> .schema ya_vol CREATE TABLE ya_vol( PERMNO INT, year INT, pya_var, pya_sd, nya, pya_var_ebi, pya_sd_ebi, pya_var_ebit, pya_sd_ebit, pya_var_ebitda, pya_sd_ebitda ); CREATE INDEX ya_vol_permno_year_idx on ya_vol (permno,year); Interestingly, I find that the problem I reported does not for columns labeled real in the schema info. For example, the type of column RET never changes no matter what the first observation is. > str(dbGetQuery(sql.industry, +"select RET from annual_data3 + where RET is not null limit 5")) 'data.frame': 5 obs. of 1 variable: $ RET: num -0.03354 -0.02113 0.03797 0.0013 -0.00678 > > str(dbGetQuery(sql.industry, +"select RET from annual_data3 + where RET is null limit 5")) 'data.frame': 5 obs. of 1 variable: $ RET: num NA NA NA NA NA > sapply(dbGetQuery(sql.industry, +"select RET from annual_data3 + where RET is null limit 5"), +typeof) RET "double" > sapply(dbGetQuery(sql.industry, +"select RET from annual_data3 + where RET is not null limit 5"), +typeof) RET "double" I still don't know how to solve this problem for variable pya_var, please help. Thanks. Best, Jia On Mon, Feb 28, 2011 at 6:48 PM, Seth Falcon wrote: > Hi Jia, > > On Mon, Feb 28, 2011 at 12:37 PM, chen jia wrote: >> When I extract data from SQLite to R, the data types (or modes) of the >> extracted data seems to be determined by the value of the first row. >> Please see the following example. > > It would help to provide the output of sessionInfo() as well as the > schema definition for the table in SQLite (or at least description of > how it was created). &
Re: [R] Data type problem when extract data from SQLite to R by using RSQLite
Hi Seth, Thanks so much for identifying the problem and explaining everything. I think the first solution that you suggest--make sure the schema has well defined types--would work the best for me. But, I have one question about how to implement it, which is more about sqlite itself. First, I found out that the columns that don't have the expected data types in the table annual_data3 are created by aggregate functions in a separate table. These columns are later combined with other columns that do. I read the link that you provide, http://www.sqlite.org/datatype3.html. One paragraph says "When grouping values with the GROUP BY clause values with different storage classes are considered distinct, except for INTEGER and REAL values which are considered equal if they are numerically equal. No affinities are applied to any values as the result of a GROUP by clause." If I understand it correctly, the columns created by aggregate functions with a GROUP by clause do not have any expected data types. My solution is to use CREATE TABLE clause to declare the expected datatype and then insert the values of columns created by the aggregate functions with the GROUP by clause. However, this solution requires a CREATE TABLE cause every time the aggregate function and the GROUP by clause is used. My question is: Is this the best way to make sure that the columns as a result of a GROUP by clause have the expected data types? Thanks. Best, Jia On Tue, Mar 1, 2011 at 1:16 AM, Seth Falcon wrote: > Hi Jia, > > On Mon, Feb 28, 2011 at 6:57 PM, chen jia wrote: >> The .schema of table annual_data3 is >> sqlite> .schema annual_data3 >> CREATE TABLE "annual_data3"( >> PERMNO INT, >> DATE INT, >> CUSIP TEXT, >> EXCHCD INT, >> SICCD INT, >> SHROUT INT, >> PRC REAL, >> RET REAL, >> ... >> pret_var, >> pRET_sd, >> nmret, >> pya_var, > > [snip] > > Is there a reason that you've told SQLite the expected data type for > only some of the columns? > >> Interestingly, I find that the problem I reported does not for columns >> labeled real in the schema info. For example, the type of column RET >> never changes no matter what the first observation is. > > Yes, that is expected and I think it is the solution to your problem: > setup your schema so that all columns have a declared type. For some > details on SQLite's type system see > http://www.sqlite.org/datatype3.html. > > RSQLite currently maps NA values to NULL in the database. Pulling > data out of a SELECT query, RSQLite uses the sqlite3_column_type > SQLite API to determine the data type and map it to an R type. If > NULL is encountered, then the schema is inspected using > sqlite3_column_decltype to attempt to obtain a type. If that fails, > the data is mapped to a character vector at the R level. The type > selection is done once after the first row has been fetched. > > To work around this you can: > > - make sure your schema has well defined > types (which will help SQLite perform its operations); > > - check whether the returned column has the expected type and convert > if needed at the R level. > > - remove NA/NULL values from the db or decide on a different way of > encoding them (e.g you might be able to use -1 in the db in some > situation to indicate missing). Your R code would then need to map > these to proper NA. > > Hope that helps. > > + seth > > > > -- > Seth Falcon | @sfalcon | http://userprimary.net/ > -- 700 Fisher Hall 2100 Neil Ave. Columbus, Ohio 43210 http://www.fisher.osu.edu/~chen_1002/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.