Re: [R] FW: logistic regression

Frank E Harrell Jr Sat, 27 Sep 2008 13:44:16 -0700

Darin Brooks wrote:

Sorry.


Let me try again then.

I am trying to find "significant" predictors" from a list of about 44
independent variables.  So I started with all 44 variables and ran


Why?  What is wrong with insignificant predictors?

drop1(sep22lr, test="Chisq")... and then dropped the highest p value from
the run. Then I reran the drop1.
Model:
MIN_Mstocked ~ ORG_CODE + BECLBL08 + PEM_SScat + SOIL_MST_1 +SOIL_NUTR + cE + cN + cELEV + cDIAM_125 + cCRCLS + cCULM_125 +cSPH + cAGE + cVRI_NONPINE + cVRI_nonpineCFR + cVRI_BLEAF +cvol_125 + cstrDST_SW + cwaterDST_SW + cSEEDSRCE_SW + cMAT +cMWMT + cMCMT + cTD + cMAP + cMSP + cAHM + cSHM + cMATMAP +cddless0 + cddless18 + cddgrtr0 + cddgrtr18 + cNFFD + cbFFP +ceFFP + cPAS + cDD5_100 + cEXT_Cold + cS_INDXDf Deviance AIC LRT Pr(Chi)<none> 814.21 938.21ORG_CODE 4 824.97 940.97 10.76 0.0294100 *BECLBL08 9 845.61 951.61 31.41 0.0002519 ***PEM_SScat 10 829.11 933.11 14.90 0.1357580SOIL_MST_1 1 814.63 936.63 0.43 0.5135094SOIL_NUTR 2 818.49 938.49 4.28 0.1175411cE 1 814.37 936.37 0.16 0.6886085cN 1 814.40 936.40 0.20 0.6566765cELEV 1 814.35 936.35 0.14 0.7044864cDIAM_125 1 817.98 939.98 3.78 0.0519554 .cCRCLS 1 819.32 941.32 5.11 0.0237598 *cCULM_125 1 816.17 938.17 1.97 0.1606846cSPH 1 816.62 938.62 2.41 0.1204141cAGE 1 815.92 937.92 1.72 0.1902314cVRI_NONPINE 1 818.04 940.04 3.84 0.0501149 .cVRI_nonpineCFR 1 821.17 943.17 6.96 0.0083197 **cVRI_BLEAF 1 818.78 940.78 4.58 0.0324286 *cvol_125 1 814.67 936.67 0.47 0.4949495cstrDST_SW 1 814.63 936.63 0.42 0.5169757cwaterDST_SW 1 814.75 936.75 0.55 0.4592643cSEEDSRCE_SW 1 817.73 939.73 3.53 0.0604234 .cMAT 1 814.27 936.27 0.06 0.8002333cMWMT 1 814.49 936.49 0.28 0.5942246cMCMT 1 819.39 941.39 5.18 0.0228425 *cTD 1 816.20 938.20 1.99 0.1580332cMAP 1 814.25 936.25 0.04 0.8386626cMSP 1 818.41 940.41 4.20 0.0404411 *cAHM 1 815.66 937.66 1.46 0.2276311cSHM 1 819.95 941.95 5.75 0.0165227 *cMATMAP 1 814.91 936.91 0.71 0.4001878cddless0 1 818.04 940.04 3.83 0.0502153 .cddless18 1 817.81 939.81 3.60 0.0576931 .cddgrtr0 1 816.64 938.64 2.44 0.1184235cddgrtr18 1 815.77 937.77 1.57 0.2104958cNFFD 1 815.38 937.38 1.18 0.2782582cbFFP 1 814.39 936.39 0.18 0.6677481ceFFP 1 820.22 942.22 6.01 0.0141863 *cPAS 1 814.21 936.21 0.01 0.9347654cDD5_100 1 814.79 936.79 0.58 0.4447531cEXT_Cold 1 816.99 938.99 2.78 0.0954512 .cS_INDX 1 815.21 937.21 1.01 0.3157208
And then systematically reran the drop1, removing the HIGHEST p value (least
significant)from each resultant until only significant (0.10) variables
remained.

Model:
MIN_Mstocked ~ ORG_CODE + BECLBL08 + PEM_SScat + SOIL_NUTR +cSEEDSRCE_SW + cMSP + ceFFP + cEXT_ColdDf Deviance AIC LRT Pr(Chi)<none> 884.20 946.20ORG_CODE 4 916.38 970.38 32.18 1.757e-06 ***
BECLBL08      9   940.66 984.66  56.46 6.418e-09 ***
PEM_SScat 11 906.20 946.20 22.00 0.0243795 *SOIL_NUTR 2 894.19 952.19 9.99 0.0067557 **cSEEDSRCE_SW 1 894.41 954.41 10.21 0.0013983 **cMSP 1 896.97 956.97 12.77 0.0003516 ***
ceFFP         1   928.50 988.50  44.30 2.812e-11 ***
cEXT_Cold     1   923.35 983.35  39.15 3.921e-10 ***


I didn't create any kind of dummy or factor variables for my categorical
data (at least, not on purpose).

With a remaining 8 variables, I tried to run a logistic regression (glm)
against my dependent variable(MIN_Mstocked).  When I do a summary of the

Estimates from this model (and especially standard errors and P-values)will be invalid because they do not take into account the stepwiseprocedure above that was used to torture the data until they confessed.


Frank

glm, I am provided with the usual table of estimate, std error, z value, and
Pr(>|z|)... BUT there are some coefficients missing in the list.  None of
the categorical data is complete.  Some are missing only one category, while
others are missing 4 or 5 categories.
e.g.

Coefficients:
Estimate Std. Error z value Pr(>|z|)(Intercept) -1.324e+02 1.363e+03 -0.097 0.922611ORG_CODE[T.DLA] -1.504e+01 1.363e+03 -0.011 0.991192ORG_CODE[T.DMO] -1.494e+01 1.363e+03 -0.011 0.991253ORG_CODE[T.DPG] -1.766e+01 1.363e+03 -0.013 0.989658ORG_CODE[T.DVA] -1.841e+01 1.363e+03 -0.014 0.989220BECLBL08[T.SBS dw 2] -6.733e-01 5.903e-01 -1.141 0.254033BECLBL08[T.SBS dw 3] -1.094e+00 5.714e-01 -1.914 0.055586 .BECLBL08[T.SBS mc 2] 1.573e-01 5.004e-01 0.314 0.753211BECLBL08[T.SBS mc 3] 1.402e+00 5.824e-01 2.408 0.016043 *BECLBL08[T.SBS mk 1] -2.388e+00 7.529e-01 -3.172 0.001514 **BECLBL08[T.SBS mw] -1.672e+01 1.393e+03 -0.012 0.990425BECLBL08[T.SBS vk] -1.614e+01 1.243e+03 -0.013 0.989640BECLBL08[T.SBS wk 1] -3.640e+00 8.174e-01 -4.453 8.48e-06 ***BECLBL08[T.SBS wk 3] -1.838e+01 1.363e+03 -0.013 0.989240PEM_SScat[T.B] -1.815e+01 3.956e+03 -0.005 0.996339PEM_SScat[T.C] 1.998e-01 3.925e-01 0.509 0.610792PEM_SScat[T.D] -2.314e-01 3.215e-01 -0.720 0.471621PEM_SScat[T.E] 5.581e-01 3.433e-01 1.626 0.104020PEM_SScat[T.F] -1.113e+00 5.782e-01 -1.926 0.054153 .PEM_SScat[T.G] 1.780e-01 4.420e-01 0.403 0.687150PEM_SScat[T.H] 1.670e+01 3.956e+03 0.004 0.996633PEM_SScat[T.I] 2.751e-01 9.313e-01 0.295 0.767705PEM_SScat[T.J] -2.623e-01 9.693e-01 -0.271 0.786649PEM_SScat[T.K] -1.862e+01 3.956e+03 -0.005 0.996244PEM_SScat[T.L] -1.661e+01 1.211e+03 -0.014 0.989056SOIL_NUTR[T.C] -1.119e+00 3.781e-01 -2.960 0.003073 **SOIL_NUTR[T.D] -7.912e-02 9.049e-01 -0.087 0.930320cSEEDSRCE_SW -1.512e-03 4.930e-04 -3.066 0.002170 **cMSP 1.808e-02 5.304e-03 3.409 0.000652 ***
ceFFP                 2.889e-01  4.662e-02   6.196 5.80e-10 ***
cEXT_Cold            -1.880e+00  3.330e-01  -5.647 1.63e-08 ***

There should be a PEM_Sscat[T.A].  It is the most prevalent occurrence in
this category.

ORG_CODE is missing more than 6 categories in the list

SOIL_NUTR should have a [T.B]
Does that help?
-----Original Message-----
From: Kevin E. Thorpe [mailto:[EMAIL PROTECTED]Sent: Saturday, September 27, 2008 6:21 AM
To: Darin Brooks
Cc: r-help@r-project.org
Subject: Re: [R] logistic regression


Darin Brooks wrote:
Good afternoon
I have what I hope is a simple logistic regression issue.I started with 44 independent variables and then used the drop1,test="chisq" to reduce the list to 8 significant independent variables.drop1(sep22lr, test="Chisq") and wound up with this model:Model: MIN_Mstocked ~ ORG_CODE + BECLBL08 + PEM_SScat + SOIL_NUTR +cSEEDSRCE_SW + cMSP + ceFFP + cEXT_Cold4 of the remaining variables are categorical and 4 are continuous.However, when I run a glm and then a summary on the glm - some of thecategorical data is missing from the output.The PEM_SScat is missing only one variable ... the BECLBL08 is missingseveral variables ... the ORG_CODE is missing 4 .. and the SOIL_NUTRis missing 1 variable.It seems arbitrary to the number of variables missing. Is theresomething wrong with my syntax in calling the logistic model? Am I not
understanding
the inputs correctly?Any help would be appreciated.
I'm not sure I fully understand your question.  It sounds like you created
your own dummy variables for your categorical variables. Did you?  Or did
you use factor variables for your categorical variables?
If the latter, then I REALLY don't understand your question.

Kevin

--
Kevin E. Thorpe
Biostatistician/Trialist, Knowledge Translation Program Assistant Professor,
Dalla Lana School of Public Health University of Toronto
email: [EMAIL PROTECTED]  Tel: 416.864.5776  Fax: 416.864.6057 No
virus found in this incoming message.
Checked by AVG - http://www.avg.com

6:55 PM

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Frank E Harrell Jr   Professor and Chair           School of Medicine
                     Department of Biostatistics   Vanderbilt University

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] FW: logistic regression

Reply via email to