rlearner309 wrote:
I think it is zero, because you have lots of zeros there. It is not like
continous variables.
Think again. The sum of products may be zero, but that is not the
covariance. And don't dismiss Thomas, he is usually right.
Anyways, the coefs of dummy variables represent differences to the same
base level, and chosing a poorly determined base level (essentially:
whose mean is determined by only a few observations) will cause high
parameter correlation. It should only affect those parameters though,
and it is not really clear what VIF means for dummy variables. One often
choses to relevel() to make the largest group the base level, but it
really comes down to which group contrasts you want to look at.
Thomas Lumley wrote:
On Wed, 2 Jul 2008, rlearner309 wrote:
I think the covariance between dummy variables or between dummy variables
and
intercept should always be zero. meaning: no sigularity problem??
No. You can easily check that this is not true using the cov() function.
Indicator variables for mutually exclusive groups are negatively
correlated.
-thomas
rlearner309 wrote:
This is actually more like a Statistics problem:
I have a dataset with two dummy variables controlling three levels. The
problem is, one level does not have many observations compared with
other
two levels (a couple of data points compared with 1000+ points on other
levels). When I run the regression, the result is bad. I have
unbalanced
SE and VIF. Does this kind of problem also belong to "near sigularity"
problem? Does it make any difference if I code the level that lacks
data
(0,0) in stead of (0,1)?
thanks a lot!
--
View this message in context:
http://www.nabble.com/A-regression-problem-using-dummy-variables-tp18214377p18237666.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Thomas Lumley Assoc. Professor, Biostatistics
[EMAIL PROTECTED] University of Washington, Seattle
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
O__ ---- Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.