pufftissue pufftissue wrote:
Hi,

When I use logistic regression, each variable has a p value associated with
it.  Do I only include the variables that have a statistically significant p
value (<0.05), or are there situations when I should include variables when
their p values are high?  I had heard that if a variable has a high p value
but it's not the terminal variable, keep it; otherwise, take it out.  Not
sure if it's right or even why this is the case.  What about if my p values
are terrible but this combo of variables yields the highest AUC and
calibration?  What prevails in this case?

Thank you!

It depends on your goals, but in general problems caused by stepwise regression arise from using P-value cutoffs that are too small rather than cutoffs that are too large. There are many reasons not to remove any variables, if you want valid confidence intervals and P-values and discrimination indexes. Note that AUC is not a great objective function; that's why we have the log likelihood.

Frank
--
Frank E Harrell Jr   Professor and Chair           School of Medicine
                     Department of Biostatistics   Vanderbilt University

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to