This is probably better for Cross Validated [https://stats.stackexchange.com]. Surprisingly, I can't quickly find an answered question on this topic. My "tl;dr" answer would be: "inflated" relative to what? Having an unbalanced sample certainly decreases the *power* of an analysis, but there's nothing 'incorrect' (AFAICS) with the estimated SEs, and no reason to try to fix them.

https://stats.stackexchange.com/questions/23108/unbalanced-design-effect

https://stats.stackexchange.com/questions/347050/unbalanced-sample-in-dummy-variable-for-ols-linear-regression

On 8/24/24 14:15, Jeff Newmiller via R-help wrote:
you say you asked elsewhere, but so many hits come up when I just search for 
"unbalanced sample size" your justification for not following the posting guide 
does not seem honest.

I also recall that various discussions of statistical power address this in 
basic statistics.

On August 24, 2024 11:05:12 AM PDT, Christofer Bogaso 
<bogaso.christo...@gmail.com> wrote:
Hi,

I have asked this question elsewhere however failed to get any
response, so hoping to get some insight from experts and statisticians
here.

Let say we are fitting a regression equation where one explanatory
variable is categorical with 2 categories. However in the sample, one
category has 95% of values but other category has just 5%. Means, the
categories are highly unbalanced.

Typically SE of estimate may be inflated for such highly unbalanced
categorical explanatory variable.

Such unbalanced case may come from 2 scenarios 1) there is a flaw in
sample or it is just by chance that second category has just 5% values
in the sample or 2) in the population itself, the second category has
very small number of occurrences which is reflected in the sample.

My question how the SE would be impacted in above 2 cases? Will the
impact be same i.e. we would get incorrect estimate of SE in both
cases? If yes, is there any way to prove analytically or may be based
on simulation?

My apologies as this question is not directly R related. However I
just wanted to get some insight on above problem related to Statistics
>from some of the great Statisticians in this forum.
Thanks for your time.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to