Dear Mr Mejias,
predicted probabilities given the regressors X from logistic regression
should be the same as the observed relative frequencies given X, because
in logistic regression E(Y | X) = P(Y = 1 | X) = 1/(1+exp(-x'ß))
(response function). The maximum likelihood estimator of E(Y | X) equals
M(Y | X), i.e. the mean of Y given the regressors X, which is equal to
the observed relative frequency of Y given X because Y is a binary
variable. Hence, M(Y | X) = f(Y=1 | X) = estimated P(Y = 1 | X) =
1/(1+exp(-x' estimated ß)) . M(M(Y|X)) = M(Y) = f(Y=1) by the law of
iterated expectations. You will get the same result if you estimate
unconditional P(Y = 1) = E(Y) directly by maximum likelihood estimation.
This can be shown by calculus: Likelihood function -> log Likelihood ->
first partial derivative of log Likelihood with respect to P(Y=1) -> set
partial derivative of log Likelihood to 0 -> P(Y=1) = M(Y).
Kind regards
Dr. Oliver Walter
Am 23.03.2021 um 18:39 schrieb Ricardo Mejias:
Unlike for linear OLS regression, PSPP logistic regression does not
produce a calculated dependent variable, which I need for my project.
When I use the coefficients of the logistic regression to do the
calculation on the same data in this way:
*COMPUTE CalcDep = 1/(1 +
EXP(-(-5.844816350213-3.733929982147*Party20210112Rep-3.429046437566*Party20210112Dem-3.537704000024*DemNpaLpf-3.867034376711*RepNpaLpf+0.92585743209*WhiteNotHisp-0.309549809307*Hispanic-0.242244899198*BlackNotHisp+0.699661534759*Genders
-0.002047977071*AgeInMonths-0.000010353254*PopulationPerSqrMileN-0.00000071631*AvgHouseValuePerPersonN+0.000001170117*AverageIncomePerPersonN))).*
the average of the values of the calculated dependent variable
(CalcDep) ismuch different than the average of the actual dependent
variable (Depen20210209LPF), unlike in linear OLS regression where
these averages or totalsare alwaysthe same. I think that when I used
logistic regression in SAS, it was the same way.
I have searched the internet extensively to find whether logistic
calculated and actual dependent variables should have the same
average. But despite the large availability of good material on
logistic regression, I could not find anything on this subject.
Do you have an answer to this question?
And could that answer be related to why there is no feature in PSPP to
show the calculated dependent variable in logistic regression?
This request does not require samples of data and code since the
answer to it does not depend on them.