Hi all, I'm using a logistic regression model (created with 'glm') with 3 variables to separate true positives from errors in a data set. All in all it seems to perform quite well, but for some reason the logit values seem to be much lower that they should be. What I mean is that in order to get ~90% sensitivity and ~90% precision I have to set my logit cutoff at around -1 or 0. From my (very limited) understanding a logit cutoff of 0 should give you around 50% precision (half your final data set it TP, half is FP). I get this effect when I run the model on the same data it was trained on. My only idea for a cause of this so far is that my training data set had roughly 10x as many true-negative data points as true-positive data points, but evening them out didn't seem to fix the problem much.
Here is my model summary with output from R's glm ===================================== Deviance Residuals: Min 1Q Median 3Q Max -4.48817 -0.17130 -0.10221 -0.05374 3.36833 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -0.85666 0.33868 -2.529 0.011425 * var1 1.08770 0.15364 7.080 1.45e-12 *** var2 0.67537 0.08003 8.439 < 2e-16 *** var3 -1.25332 0.33595 -3.731 0.000191 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 1230.63 on 2034 degrees of freedom Residual deviance: 341.81 on 2031 degrees of freedom ===================================== thanks in advance! -- View this message in context: http://r.789695.n4.nabble.com/Logistic-regression-model-returns-lower-than-expected-logit-tp3526542p3526542.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.