Dear Mr Mejias,

predicted probabilities given the regressors X from logistic regression should be the same as the observed relative frequencies given X, because in logistic regression E(Y | X) = P(Y = 1 | X) = 1/(1+exp(-x'ß)) (response function). The maximum likelihood estimator of E(Y | X) equals M(Y | X), i.e. the mean of Y given the regressors X, which is equal to the observed relative frequency of Y given X because Y is a binary variable. Hence, M(Y | X) = f(Y=1 | X) = estimated P(Y = 1 | X) = 1/(1+exp(-x' estimated ß)) . M(M(Y|X)) = M(Y) = f(Y=1) by the law of iterated expectations. You will get the same result if you estimate unconditional P(Y = 1) = E(Y) directly by maximum likelihood estimation. This can be shown by calculus: Likelihood function -> log Likelihood -> first partial derivative of log Likelihood with respect to P(Y=1) -> set partial derivative of log Likelihood to 0 -> P(Y=1) = M(Y).


Kind regards


Dr. Oliver Walter



Am 23.03.2021 um 18:39 schrieb Ricardo Mejias:

Unlike for linear OLS regression, PSPP logistic regression does not produce a calculated dependent variable, which I need for my project. When I use the coefficients of the logistic regression to do the calculation on the same data in this way:


*COMPUTE CalcDep = 1/(1 + EXP(-(-5.844816350213-3.733929982147*Party20210112Rep-3.429046437566*Party20210112Dem-3.537704000024*DemNpaLpf-3.867034376711*RepNpaLpf+0.92585743209*WhiteNotHisp-0.309549809307*Hispanic-0.242244899198*BlackNotHisp+0.699661534759*Genders -0.002047977071*AgeInMonths-0.000010353254*PopulationPerSqrMileN-0.00000071631*AvgHouseValuePerPersonN+0.000001170117*AverageIncomePerPersonN))).*


the average of the values of the calculated dependent variable (CalcDep) ismuch different than the average of the actual dependent variable (Depen20210209LPF), unlike in linear OLS regression where these averages or totalsare alwaysthe same. I think that when I used logistic regression in SAS, it was the same way.


I have searched the internet extensively to find whether logistic calculated and actual dependent variables should have the same average. But despite the large availability of good material on logistic regression, I could not find anything on this subject.


Do you have an answer to this question?


And could that answer be related to why there is no feature in PSPP to show the calculated dependent variable in logistic regression?


This request does not require samples of data and code since the answer to it does not depend on them.




Reply via email to