[R] How do I solve a disk I/O error with RSQLite?

2013-02-11 Thread chen jia
Hi there,

I had an error when using RSQLite. My code and the error message are
the following:

> dbGetQuery(sql.dispersion,
+"create table monthly_data_temp as
+ select a.*, b.industry
+ from   monthly_data as a left join
+siccd_industry_ff49 as b
+ on a.siccd = b.siccd
+ order by permno, date")
Error in sqliteExecStatement(con, statement, bind.data) :
  RS-DBI driver: (RS_SQLite_exec: could not execute1: disk I/O error)
Calls: dbGetQuery ... dbGetQuery -> sqliteQuickSQL ->
sqliteExecStatement -> .Call
Execution halted

I ran the same code by using sqlite3 from the command line and had no error.

$ sqlite3 dispersion.db
SQLite version 3.7.9 2011-11-01 00:52:41
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> create table monthly_data_temp as
   ...> select a.*, b.industry
   ...> from   monthly_data as a left join
   ...>siccd_industry_ff49 as b
   ...> on a.siccd = b.siccd
   ...> order by permno, date;
sqlite>

I initialized the connection for SQLite by using

assign("sql.dispersion",
   dbConnect(SQLite(),
 loadable.extensions=TRUE,
 synchronous=0,
 cache_size=20,

dbname="/home/chenj/Fun/Research/Projects/Dispersion/Data/dispersion.db"),
   envir = .GlobalEnv)

and I also loaded an extension:
dbGetQuery(sql.dispersion,
   "select
load_extension('/home/chenj/Fun/SQLite/Extension/LibraryFile64Bit/libsqlitefunctions.so')")
cat("Connected to sql.dispersion",fill=TRUE)

The directory content for the database file is:
$ ls -l
total 34435604
-rw-r--r-- 1 chenj chenj 216 Jun 22  2012 C:\nppdf32Log\debuglog.txt
-rw-r--r-- 1 chenj chenj   26947 Mar 22  2011 Data_Dispersion.ods
drwxr-xr-x 2 chenj chenj4096 Dec  2  2011 Data_from_Hou
drwxr-xr-x 2 chenj chenj  139264 Feb  2 23:30 dispersion
-rw-r--r-- 1 chenj chenj 33468631040 Feb 12 10:55 dispersion.db
-rw-r--r-- 1 chenj chenj   785633114 Nov 22  2009 monthly_data.csv
-rwxr-xr-x 1 chenj chenj  1007551488 Nov  2  2007 monthly_data.sas7bdat

If you could help me find out the cause of this error and solve it, it
would be great!

Please let me know if you need more information!

Best,
Jia

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How do I test against a simple null that two regressions coefficients are equal?

2010-07-07 Thread chen jia
Hi there,

I run two regressions:

y = a1 + b1 * x + e1
y = a2 + b2 * z + e2

I want to test against the null hypothesis: b1 = b2.  How do I design the test?

I think I can add two equations together and divide both sides by 2:
y = 0.5*(a1+a2) + 0.5*b1 * x + 0.5*b2 * z + e3, where e3 = 0.5*(e1 + e2).
or just y = a3 + 0.5*b1 * x + 0.5*b2 * z + e3

If I run this new regression, I can test against the null b1 = b2 in
this regression.  Is it an equivalent test as the original one? If
yes, how do I do that in R?

Alternatively, I think I can just test against the null:
correlation(y, x) = correlation(y, z), where correlation(. , .) is the
correlation between two random variables. Is this equivalent too? If
yes, how do I do it in R?

Thanks.

Best,
Jia

-- 
 Ohio State University - Finance
   248 Fisher Hall
2100 Neil Ave.
  Columbus, Ohio  43210
 Telephone: 614-292-2830
   http://www.fisher.osu.edu/~chen_1002/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How do I test against a simple null that two regressions coefficients are equal?

2010-07-08 Thread chen jia
Thanks, Chuck. I am reading the references, which are helpful.

Just to understand what I have done wrong here,

I proposed an alternative testing strategy:
I run regressions y = a3 + b1 * x + b2 * z + e3 and test alternative
hypothesis b1 !=  b2 against the null hypothesis b1 = b2 in this
equation.

Is it this the same test as

y = a1 + b1*x + e1
y = a2 + b2*x + e2
test alternative hypothesis b1 != b2 against null hypothesis b1 = b2.

Best,
Jia

On Wed, Jul 7, 2010 at 11:12 PM, Charles C. Berry  wrote:
> On Wed, 7 Jul 2010, chen jia wrote:
>
>> Hi there,
>>
>> I run two regressions:
>>
>> y = a1 + b1 * x + e1
>> y = a2 + b2 * z + e2
>>
>> I want to test against the null hypothesis: b1 = b2.  How do I design the
>> test?
>>
>
> You are testing a non-nested hypothesis, which requires special handling.
>
> The classical test is due to Hotelling, but see the references (and R code
> snippets) in this posting:
>
>        http://markmail.org/message/egnowmdzpzjtahy7
>
> (it is the merest coincidence that the above thread was initiated by Mark
> Leeds and that the URL is 'markmail' :-) )
>
> HTH,
>
> Chuck
>
>
>> I think I can add two equations together and divide both sides by 2:
>> y = 0.5*(a1+a2) + 0.5*b1 * x + 0.5*b2 * z + e3, where e3 = 0.5*(e1 + e2).
>> or just y = a3 + 0.5*b1 * x + 0.5*b2 * z + e3
>>
>> If I run this new regression, I can test against the null b1 = b2 in
>> this regression.  Is it an equivalent test as the original one? If
>> yes, how do I do that in R?
>>
>> Alternatively, I think I can just test against the null:
>> correlation(y, x) = correlation(y, z), where correlation(. , .) is the
>> correlation between two random variables. Is this equivalent too? If
>> yes, how do I do it in R?
>>
>> Thanks.
>>
>> Best,
>> Jia
>>
>> --
>>                        Ohio State University - Finance
>>                                  248 Fisher Hall
>>                                   2100 Neil Ave.
>>                             Columbus, Ohio  43210
>>                            Telephone: 614-292-2830
>>                      http://www.fisher.osu.edu/~chen_1002/
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> Charles C. Berry                            (858) 534-2098
>                                            Dept of Family/Preventive
> Medicine
> E mailto:cbe...@tajo.ucsd.edu               UC San Diego
> http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901
>
>
>



-- 
 Ohio State University - Finance
   248 Fisher Hall
2100 Neil Ave.
  Columbus, Ohio  43210
 Telephone: 614-292-2830
   http://www.fisher.osu.edu/~chen_1002/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How do I test against a simple null that two regressions coefficients are equal?

2010-07-08 Thread chen jia
OK. Thanks again.

I will read the references more.

Best,
Jia

On Thu, Jul 8, 2010 at 10:51 AM,   wrote:
> hi: no. it's not the same. if you read the paper that I referenced last
> night, that explains how to do the following  test :
>
> Ho: R2 = R1
>
> H1: R2 != R1
>
> that's a different test from what you did but i think it's what you want.
>
>
>
>
> On Jul 8, 2010, chen jia  wrote:
>
> Thanks, Chuck. I am reading the references, which are helpful.
>
> Just to understand what I have done wrong here,
>
> I proposed an alternative testing strategy:
> I run regressions y = a3 + b1 * x + b2 * z + e3 and test alternative
> hypothesis b1 != b2 against the null hypothesis b1 = b2 in this
> equation.
>
> Is it this the same test as
>
> y = a1 + b1*x + e1
> y = a2 + b2*x + e2
> test alternative hypothesis b1 != b2 against null hypothesis b1 = b2.
>
> Best,
> Jia
>
> On Wed, Jul 7, 2010 at 11:12 PM, Charles C. Berry 
> wrote:
>> On Wed, 7 Jul 2010, chen jia wrote:
>>
>>> Hi there,
>>>
>>> I run two regressions:
>>>
>>> y = a1 + b1 * x + e1
>>> y = a2 + b2 * z + e2
>>>
>>> I want to test against the null hypothesis: b1 = b2.  How do I design the
>>> test?
>>>
>>
>> You are testing a non-nested hypothesis, which requires special handling.
>>
>> The classical test is due to Hotelling, but see the references (and R code
>> snippets) in this posting:
>>
>>        http://markmail.org/message/egnowmdzpzjtahy7
>>
>> (it is the merest coincidence that the above thread was initiated by Mark
>> Leeds and that the URL is 'markmail' :-) )
>>
>> HTH,
>>
>> Chuck
>>
>>
>>> I think I can add two equations together and divide both sides by 2:
>>> y = 0.5*(a1+a2) + 0.5*b1 * x + 0.5*b2 * z + e3, where e3 = 0.5*(e1 + e2).
>>> or just y = a3 + 0.5*b1 * x + 0.5*b2 * z + e3
>>>
>>> If I run this new regression, I can test against the null b1 = b2 in
>>> this regression.  Is it an equivalent test as the original one? If
>>> yes, how do I do that in R?
>>>
>>> Alternatively, I think I can just test against the null:
>>> correlation(y, x) = correlation(y, z), where correlation(. , .) is the
>>> correlation between two random variables. Is this equivalent too? If
>>> yes, how do I do it in R?
>>>
>>> Thanks.
>>>
>>> Best,
>>> Jia
>>>
>>> --
>>>                        Ohio State University - Finance
>>>                                  248 Fisher Hall
>>>                                   2100 Neil Ave.
>>>                             Columbus, Ohio  43210
>>>                            Telephone: 614-292-2830
>>>                      http://www.fisher.osu.edu/~chen_1002/
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> Charles C. Berry                            (858) 534-2098
>>                                            Dept of Family/Preventive
>> Medicine
>> E mailto:cbe...@tajo.ucsd.edu               UC San Diego
>> http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901
>>
>>
>>
>
>
>
> --
> Ohio State University - Finance
> 248 Fisher Hall
> 2100 Neil Ave.
> Columbus, Ohio 43210
> Telephone: 614-292-2830
> http://www.fisher.osu.edu/~chen_1002/
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
 Ohio State University - Finance
   248 Fisher Hall
2100 Neil Ave.
  Columbus, Ohio  43210
 Telephone: 614-292-2830
   http://www.fisher.osu.edu/~chen_1002/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How do I test against a simple null that tworegressions coefficients are equal?

2010-07-08 Thread chen jia
Bert,

Thanks for replying. Actually, Chuck and Mark's suggestions already
helped, especially regarding to the problem you call type III error.

Best,
Jia

On Thu, Jul 8, 2010 at 11:06 AM, Bert Gunter  wrote:
> Inline below.
>
> Bert Gunter
> Genentech Nonclinical Statistics
>
>> -Original Message-
>> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
>> On Behalf Of markle...@verizon.net
>> Sent: Thursday, July 08, 2010 7:52 AM
>> To: chen_1...@fisher.osu.edu
>> Cc: r-help@r-project.org; cbe...@tajo.ucsd.edu
>> Subject: Re: [R] How do I test against a simple null that tworegressions
>> coefficients are equal?
>>
>>
>>    hi: no. it's not the same. if you read the paper that I referenced last
>>    night, that explains how to do the following  test :
>>    Ho: R2 = R1
>>    H1: R2 != R1
>>    that's a different test from what you did but i think it's what you
>> want.
>>
>>    On Jul 8, 2010, chen jia  wrote:
>>
>>      Thanks, Chuck. I am reading the references, which are helpful.
>>      Just to understand what I have done wrong here,
>>      I proposed an alternative testing strategy:
>>      I run regressions y = a3 + b1 * x + b2 * z + e3 and test alternative
>>      hypothesis b1 != b2 against the null hypothesis b1 = b2 in this
>>      equation.
>
> If this is what you want (see below) try this:
>
> w <- x + z
>
> anova(lm(y ~ w), lm(y ~ x + w))
>
>
> However, as you seem to be somewhat at sea in your basic statistical
> knowledge, I suggest you consult a local statistician to make sure whether
> you are asking sensible questions in the first place. You appear to be at
> high risk for type III error (right answer to wrong question).
>
> -- Bert
>
>
>
>
>>      Is it this the same test as
>>      y = a1 + b1*x + e1
>>      y = a2 + b2*x + e2
>>      test alternative hypothesis b1 != b2 against null hypothesis b1 = b2.
>>      Best,
>>      Jia
>>      On Wed, Jul 7, 2010 at 11:12 PM, Charles C. Berry
>> 
>>      wrote:
>>      > On Wed, 7 Jul 2010, chen jia wrote:
>>      >
>>      >> Hi there,
>>      >>
>>      >> I run two regressions:
>>      >>
>>      >> y = a1 + b1 * x + e1
>>      >> y = a2 + b2 * z + e2
>>      >>
>>      >> I want to test against the null hypothesis: b1 = b2. Â How do I
>> design
>>      the
>>      >> test?
>>      >>
>>      >
>>      >  You  are testing a non-nested hypothesis, which requires special
>>      handling.
>>      >
>>      > The classical test is due to Hotelling, but see the references (and
>> R
>>      code
>>      > snippets) in this posting:
>>      >
>>      > Â  Â  Â  Â http://markmail.org/message/egnowmdzpzjtahy7
>>      >
>>      > (it is the merest coincidence that the above thread was initiated
>> by
>>      Mark
>>      > Leeds and that the URL is 'markmail' :-) )
>>      >
>>      > HTH,
>>      >
>>      > Chuck
>>      >
>>      >
>>      >> I think I can add two equations together and divide both sides by
>> 2:
>>      >> y = 0.5*(a1+a2) + 0.5*b1 * x + 0.5*b2 * z + e3, where e3 = 0.5*(e1
>> +
>>      e2).
>>      >> or just y = a3 + 0.5*b1 * x + 0.5*b2 * z + e3
>>      >>
>>      >> If I run this new regression, I can test against the null b1 = b2
>> in
>>      >> this regression. Â Is it an equivalent test as the original one?
>> If
>>      >> yes, how do I do that in R?
>>      >>
>>      >> Alternatively, I think I can just test against the null:
>>      >> correlation(y, x) = correlation(y, z), where correlation(. , .) is
>> the
>>      >> correlation between two random variables. Is this equivalent too?
>> If
>>      >> yes, how do I do it in R?
>>      >>
>>      >> Thanks.
>>      >>
>>      >> Best,
>>      >> Jia
>>      >>
>>      >> --
>>      >> Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â Ohio State University - Finance
>>      >> Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â 248 Fisher Hall
>>      >> Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  2100 Neil Ave.
>>      >>   Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Columbus, Ohio  43210
>>      >> Â 

[R] How do I move axis labels closer to plot box?

2010-07-12 Thread chen jia
Hi there,

I place a vector of strings as labels at the tick points by using

axis(1,at=seq(0.1,0.7,by=0.1),
labels=paste(seq(10,70,by=10),"%",sep=""), tick=FALSE)

However, there is a large space between those labels and the boundary
of plot box. I want to reduce this space so that the labels appear
just next to the boundary of the plot box. How do I do that?

Thanks.

Best,
Jia

-- 
                         Ohio State University - Finance
                                   248 Fisher Hall
                                    2100 Neil Ave.
                              Columbus, Ohio  43210
                             Telephone: 614-292-2830
                       http://www.fisher.osu.edu/~chen_1002/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Regressions with fixed-effect in R

2010-05-10 Thread chen jia
Hi there,

Maybe people who know both R and econometrics will be able to answer
my questions.

I want to run panel regressions in R with fixed-effect. I know two
ways to do it.
First, I can include factor(grouping_variable) in my regression equation.
Second, I plan to subtract group mean from my variables and run OLS
panel regression with function lm().

I plan to do it with the second way because the number of groups is
large, which incur computational problems inverting large
model-matrix.

I am interested in the R-squared and adjusted R-squared out of these
regressions.

Do I need to adjust my R-squared after I run OLS regressions with
demeaned variables?

Are there any functions that specifically deal with fixed-effects?

Thanks.

Best,
Jia

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Pairwise combination

2010-05-11 Thread chen jia
Hi there,

I am looking for a function that takes a vector as input and generates
all pair wise combination of the elements of the input vector.

For example, the input vector is c(1,2,3,4). The output vector is c(1
and 2, 1 and 3, 1 and 4, 2 and 3, 2 and 4, 3 and 4). The
representation here is generic.

I can use a for loop to do it, but it gets slow when the number of
elements of the input vector is large.

Best,
Jia

-- 
 Ohio State University - Finance
   248 Fisher Hall
2100 Neil Ave.
  Columbus, Ohio  43210
 Telephone: 614-292-2830
   http://www.fisher.osu.edu/~chen_1002/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regressions with fixed-effect in R

2010-05-11 Thread chen jia
Thanks. I have the PDF document that you suggest. It is very brief on
fixed-effect in panel regressions.

I will look into plm package.

Best,
Jia

On Tue, May 11, 2010 at 8:19 AM, Liviu Andronic  wrote:
> On 5/11/10, chen jia  wrote:
>>  Are there any functions that specifically deal with fixed-effects?
>>
> Other than plm and its vignette, you may want to check this document [1].
> Liviu
>
> [1] http://cran.r-project.org/doc/contrib/Farnsworth-EconometricsInR.pdf
>



-- 
 Ohio State University - Finance
   248 Fisher Hall
2100 Neil Ave.
  Columbus, Ohio  43210
 Telephone: 614-292-2830
   http://www.fisher.osu.edu/~chen_1002/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Pairwise combination

2010-05-11 Thread chen jia
Thanks, everyone.

combn function works pretty well for me.

Best,
Jia

On Tue, May 11, 2010 at 5:12 PM, Henrique Dallazuanna  wrote:
> Or:
>
> combn(1:4, 2, paste, collapse = ' and ')
>
> On Tue, May 11, 2010 at 5:54 PM, Ista Zahn  wrote:
>>
>> On Tuesday 11 May 2010 4:27:01 pm David Winsemius wrote:
>> > On May 11, 2010, at 1:45 PM, chen jia wrote:
>> > > Hi there,
>> > >
>> > > I am looking for a function that takes a vector as input and generates
>> > > all pair wise combination of the elements of the input vector.
>> >
>> > ?combn
>> >
>> > > For example, the input vector is c(1,2,3,4). The output vector is c(1
>> > > and 2, 1 and 3, 1 and 4, 2 and 3, 2 and 4, 3 and 4).
>> >
>> > That's not a vector.
>>
>> True, although it is possible the OP wanted paste(combn(1:4, 2)[1,],
>> "and",
>> combn(1:4, 2)[2,])
>>
>> -Ista
>>
>> >
>> > > The
>> > > representation here is generic.
>> > >
>> > > I can use a for loop to do it, but it gets slow when the number of
>> > > elements of the input vector is large.
>> > >
>> > > Best,
>> > > Jia
>> >
>> > David Winsemius, MD
>> > West Hartford, CT
>> >
>> > __
>> > R-help@r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html and provide commented,
>> > minimal, self-contained, reproducible code.
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Henrique Dallazuanna
> Curitiba-Paraná-Brasil
> 25° 25' 40" S 49° 16' 22" O
>



-- 
 Ohio State University - Finance
   248 Fisher Hall
2100 Neil Ave.
  Columbus, Ohio  43210
 Telephone: 614-292-2830
   http://www.fisher.osu.edu/~chen_1002/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Data type problem when extract data from SQLite to R by using RSQLite

2011-02-28 Thread chen jia
Hi there,

When I extract data from SQLite to R, the data types (or modes) of the
extracted data seems to be determined by the value of the first row.
Please see the following example.

When I put the missing values first, the column extracted is of the
mode character.
> str(dbGetQuery(sql.industry,
+"select pya_var from annual_data3
+ order by pya_var"))
'data.frame':   155465 obs. of  1 variable:
 $ pya_var: chr  NA NA NA NA ...

When I put the non-missing values first, the column extracted is of
the mode numeric.
> str(dbGetQuery(sql.industry,
+"select pya_var from annual_data3
+ order by pya_var desc"))
'data.frame':   155465 obs. of  1 variable:
 $ pya_var: num  2.23 2.08 2.04 2 1.86 ...

And, if the missing value happens to be the first observation, the
whole column is converted to the mode character.
> str(dbGetQuery(sql.industry,
+"select pya_var from annual_data3"))
'data.frame':   155465 obs. of  1 variable:
 $ pya_var: chr  NA "0.0" "0.000532137017747267" "0.00655147489334259" ...

This creates a problem because this column "pya_var" is supposed to be
numeric. If it is converted to character, I will encounter errors in
subsequent computation. Although I can convert the data back to
numeric, but it will be a hassle to check whether it is necessary and
which columns to convert every time I extract data from SQLite into R.

Do you have any idea how I can make sure the data type stays numeric
regardless what value the first observation is? Thanks.

Best,
Jia

-- 
700 Fisher Hall
2100 Neil Ave.
Columbus, Ohio  43210
http://www.fisher.osu.edu/~chen_1002/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data type problem when extract data from SQLite to R by using RSQLite

2011-02-28 Thread chen jia
Hi Seth,

Thanks for the reply. I provide info from sessionInfo() and about
schema that you ask. Please take a look.

The output from sessionInfo() is
> sessionInfo()
R version 2.12.2 (2011-02-25)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UT> sessionInfo()
R version 2.12.2 (2011-02-25)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] plyr_1.2.1 RSQLite_0.9-2  DBI_0.2-5  filehash_2.1-1F-8
  LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] plyr_1.2.1 RSQLite_0.9-2  DBI_0.2-5  filehash_2.1-1

The .schema of table annual_data3 is
sqlite> .schema annual_data3
CREATE TABLE "annual_data3"(
  PERMNO INT,
  DATE INT,
  CUSIP TEXT,
  EXCHCD INT,
  SICCD INT,
  SHROUT INT,
  PRC REAL,
  RET REAL,
  ...
  pret_var,
  pRET_sd,
  nmret,
  pya_var,
  pya_sd,
  nya,
  pya_var_ebi,
  pya_sd_ebi,
  pya_var_ebit,
  pya_sd_ebit,
  pya_var_ebitda,
  pya_sd_ebitda,
  logage REAL,
  logasset REAL,
  ...
  loglead1stdaret, loglead2stdaret)

Table annual_data3 is created by joining table annual_data2 and
ya_vol. The column pya_var is initially in ya_vol.

dbGetQuery(sql.industry,
   "create table annual_data3 as
select a.*, b.pya_var, b.pya_sd,
   b.nya, b.pya_var_ebi, b.pya_sd_ebi,
   b.pya_var_ebit, b.pya_sd_ebit,
   b.pya_var_ebitda, b.pya_sd_ebitda
from   annual_data2 as a left join
  ya_vol as b
on a.permno = b.permno and
 a.year = b.year
order by permno, year")

Table ya_vol is created by
dbGetQuery(sql.industry,
   "create table ya_vol as
select PERMNO, year,
   variance(ya) as pya_var,
   stdev(ya) as pya_sd,
   count(*) as nya,
   variance(ya_ebi) as pya_var_ebi,
   stdev(ya_ebi) as pya_sd_ebi,
   variance(ya_ebit) as pya_var_ebit,
   stdev(ya_ebit) as pya_sd_ebit,
   variance(ya_ebitda) as pya_var_ebitda,
   stdev(ya_ebitda) as pya_sd_ebitda
from   past_ya
where  ya is not null
group by PERMNO, year
order by PERMNO, year")

The schema info of ya_vol is
sqlite> .schema ya_vol
CREATE TABLE ya_vol(
  PERMNO INT,
  year INT,
  pya_var,
  pya_sd,
  nya,
  pya_var_ebi,
  pya_sd_ebi,
  pya_var_ebit,
  pya_sd_ebit,
  pya_var_ebitda,
  pya_sd_ebitda
);
CREATE INDEX ya_vol_permno_year_idx
on ya_vol (permno,year);

Interestingly, I find that the problem I reported does not for columns
labeled real in the schema info. For example, the type of column RET
never changes no matter what the first observation is.

> str(dbGetQuery(sql.industry,
+"select RET from annual_data3
+ where RET is not null limit 5"))
'data.frame':   5 obs. of  1 variable:
 $ RET: num  -0.03354 -0.02113 0.03797 0.0013 -0.00678
>
> str(dbGetQuery(sql.industry,
+"select RET from annual_data3
+ where RET is null limit 5"))
'data.frame':   5 obs. of  1 variable:
 $ RET: num  NA NA NA NA NA
> sapply(dbGetQuery(sql.industry,
+"select RET from annual_data3
+ where RET is null limit 5"),
+typeof)
 RET
"double"
> sapply(dbGetQuery(sql.industry,
+"select RET from annual_data3
+ where RET is not null limit 5"),
+typeof)
 RET
"double"

I still don't know how to solve this problem for variable pya_var,
please help. Thanks.

Best,
Jia


On Mon, Feb 28, 2011 at 6:48 PM, Seth Falcon  wrote:
> Hi Jia,
>
> On Mon, Feb 28, 2011 at 12:37 PM, chen jia  wrote:
>> When I extract data from SQLite to R, the data types (or modes) of the
>> extracted data seems to be determined by the value of the first row.
>> Please see the following example.
>
> It would help to provide the output of sessionInfo() as well as the
> schema definition for the table in SQLite (or at least description of
> how it was created).
&

Re: [R] Data type problem when extract data from SQLite to R by using RSQLite

2011-03-01 Thread chen jia
Hi Seth,

Thanks so much for identifying the problem and explaining everything.
I think the first solution that you suggest--make sure the schema has
well defined types--would work the best for me. But, I have one
question about how to implement it, which is more about sqlite itself.

First, I found out that the columns that don't have the expected data
types in the table annual_data3 are created by aggregate functions in
a separate table. These columns are later combined with other columns
that do.

I read the link that you provide,
http://www.sqlite.org/datatype3.html. One paragraph says "When
grouping values with the GROUP BY clause values with different storage
classes are considered distinct, except for INTEGER and REAL values
which are considered equal if they are numerically equal. No
affinities are applied to any values as the result of a GROUP by
clause."

If I understand it correctly, the columns created by aggregate
functions with a GROUP by clause do not have any expected data types.

My solution is to use CREATE TABLE clause to declare the expected
datatype and then insert the values of columns created by the
aggregate functions with the GROUP by clause. However, this solution
requires a CREATE TABLE cause every time the aggregate function and
the GROUP by clause is used.

My question is: Is this the best way to make sure that the columns as
a result of a GROUP by clause have the expected data types? Thanks.

Best,
Jia

On Tue, Mar 1, 2011 at 1:16 AM, Seth Falcon  wrote:
> Hi Jia,
>
> On Mon, Feb 28, 2011 at 6:57 PM, chen jia  wrote:
>> The .schema of table annual_data3 is
>> sqlite> .schema annual_data3
>> CREATE TABLE "annual_data3"(
>>  PERMNO INT,
>>  DATE INT,
>>  CUSIP TEXT,
>>  EXCHCD INT,
>>  SICCD INT,
>>  SHROUT INT,
>>  PRC REAL,
>>  RET REAL,
>>  ...
>>  pret_var,
>>  pRET_sd,
>>  nmret,
>>  pya_var,
>
> [snip]
>
> Is there a reason that you've told SQLite the expected data type for
> only some of the columns?
>
>> Interestingly, I find that the problem I reported does not for columns
>> labeled real in the schema info. For example, the type of column RET
>> never changes no matter what the first observation is.
>
> Yes, that is expected and I think it is the solution to your problem:
> setup your schema so that all columns have a declared type.  For some
> details on SQLite's type system see
> http://www.sqlite.org/datatype3.html.
>
> RSQLite currently maps NA values to NULL in the database.  Pulling
> data out of a SELECT query, RSQLite uses the sqlite3_column_type
> SQLite API to determine the data type and map it to an R type.  If
> NULL is encountered, then the schema is inspected using
> sqlite3_column_decltype to attempt to obtain a type.  If that fails,
> the data is mapped to a character vector at the R level.  The type
> selection is done once after the first row has been fetched.
>
> To work around this you can:
>
> - make sure your schema has well defined
>  types (which will help SQLite perform its operations);
>
> - check whether the returned column has the expected type and convert
>  if needed at the R level.
>
> - remove NA/NULL values from the db or decide on a different way of
>  encoding them (e.g you might be able to use -1 in the db in some
>  situation to indicate missing).  Your R code would then need to map
>  these to proper NA.
>
> Hope that helps.
>
> + seth
>
>
>
> --
> Seth Falcon | @sfalcon | http://userprimary.net/
>



-- 
700 Fisher Hall
2100 Neil Ave.
Columbus, Ohio  43210
http://www.fisher.osu.edu/~chen_1002/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.