Darin Brooks wrote: > Sorry. > > Let me try again then. > > I am trying to find "significant" predictors" from a list of about 44 > independent variables. So I started with all 44 variables and ran > drop1(sep22lr, test="Chisq")... and then dropped the highest p value from > the run. Then I reran the drop1. > > Model: > MIN_Mstocked ~ ORG_CODE + BECLBL08 + PEM_SScat + SOIL_MST_1 + > SOIL_NUTR + cE + cN + cELEV + cDIAM_125 + cCRCLS + cCULM_125 + > cSPH + cAGE + cVRI_NONPINE + cVRI_nonpineCFR + cVRI_BLEAF + > cvol_125 + cstrDST_SW + cwaterDST_SW + cSEEDSRCE_SW + cMAT + > cMWMT + cMCMT + cTD + cMAP + cMSP + cAHM + cSHM + cMATMAP + > cddless0 + cddless18 + cddgrtr0 + cddgrtr18 + cNFFD + cbFFP + > ceFFP + cPAS + cDD5_100 + cEXT_Cold + cS_INDX > Df Deviance AIC LRT Pr(Chi) > <none> 814.21 938.21 > ORG_CODE 4 824.97 940.97 10.76 0.0294100 * > BECLBL08 9 845.61 951.61 31.41 0.0002519 *** > PEM_SScat 10 829.11 933.11 14.90 0.1357580 > SOIL_MST_1 1 814.63 936.63 0.43 0.5135094 > SOIL_NUTR 2 818.49 938.49 4.28 0.1175411 > cE 1 814.37 936.37 0.16 0.6886085 > cN 1 814.40 936.40 0.20 0.6566765 > cELEV 1 814.35 936.35 0.14 0.7044864 > cDIAM_125 1 817.98 939.98 3.78 0.0519554 . > cCRCLS 1 819.32 941.32 5.11 0.0237598 * > cCULM_125 1 816.17 938.17 1.97 0.1606846 > cSPH 1 816.62 938.62 2.41 0.1204141 > cAGE 1 815.92 937.92 1.72 0.1902314 > cVRI_NONPINE 1 818.04 940.04 3.84 0.0501149 . > cVRI_nonpineCFR 1 821.17 943.17 6.96 0.0083197 ** > cVRI_BLEAF 1 818.78 940.78 4.58 0.0324286 * > cvol_125 1 814.67 936.67 0.47 0.4949495 > cstrDST_SW 1 814.63 936.63 0.42 0.5169757 > cwaterDST_SW 1 814.75 936.75 0.55 0.4592643 > cSEEDSRCE_SW 1 817.73 939.73 3.53 0.0604234 . > cMAT 1 814.27 936.27 0.06 0.8002333 > cMWMT 1 814.49 936.49 0.28 0.5942246 > cMCMT 1 819.39 941.39 5.18 0.0228425 * > cTD 1 816.20 938.20 1.99 0.1580332 > cMAP 1 814.25 936.25 0.04 0.8386626 > cMSP 1 818.41 940.41 4.20 0.0404411 * > cAHM 1 815.66 937.66 1.46 0.2276311 > cSHM 1 819.95 941.95 5.75 0.0165227 * > cMATMAP 1 814.91 936.91 0.71 0.4001878 > cddless0 1 818.04 940.04 3.83 0.0502153 . > cddless18 1 817.81 939.81 3.60 0.0576931 . > cddgrtr0 1 816.64 938.64 2.44 0.1184235 > cddgrtr18 1 815.77 937.77 1.57 0.2104958 > cNFFD 1 815.38 937.38 1.18 0.2782582 > cbFFP 1 814.39 936.39 0.18 0.6677481 > ceFFP 1 820.22 942.22 6.01 0.0141863 * > cPAS 1 814.21 936.21 0.01 0.9347654 > cDD5_100 1 814.79 936.79 0.58 0.4447531 > cEXT_Cold 1 816.99 938.99 2.78 0.0954512 . > cS_INDX 1 815.21 937.21 1.01 0.3157208 > > > And then systematically reran the drop1, removing the HIGHEST p value (least > significant)from each resultant until only significant (0.10) variables > remained. > > Model: > MIN_Mstocked ~ ORG_CODE + BECLBL08 + PEM_SScat + SOIL_NUTR + > cSEEDSRCE_SW + cMSP + ceFFP + cEXT_Cold > Df Deviance AIC LRT Pr(Chi) > <none> 884.20 946.20 > ORG_CODE 4 916.38 970.38 32.18 1.757e-06 *** > BECLBL08 9 940.66 984.66 56.46 6.418e-09 *** > PEM_SScat 11 906.20 946.20 22.00 0.0243795 * > SOIL_NUTR 2 894.19 952.19 9.99 0.0067557 ** > cSEEDSRCE_SW 1 894.41 954.41 10.21 0.0013983 ** > cMSP 1 896.97 956.97 12.77 0.0003516 *** > ceFFP 1 928.50 988.50 44.30 2.812e-11 *** > cEXT_Cold 1 923.35 983.35 39.15 3.921e-10 *** > > > I didn't create any kind of dummy or factor variables for my categorical > data (at least, not on purpose). > > With a remaining 8 variables, I tried to run a logistic regression (glm) > against my dependent variable(MIN_Mstocked). When I do a summary of the > glm, I am provided with the usual table of estimate, std error, z value, and > Pr(>|z|)... BUT there are some coefficients missing in the list. None of > the categorical data is complete. Some are missing only one category, while > others are missing 4 or 5 categories. > > e.g. > > Coefficients: > Estimate Std. Error z value Pr(>|z|) > (Intercept) -1.324e+02 1.363e+03 -0.097 0.922611 > ORG_CODE[T.DLA] -1.504e+01 1.363e+03 -0.011 0.991192 > ORG_CODE[T.DMO] -1.494e+01 1.363e+03 -0.011 0.991253 > ORG_CODE[T.DPG] -1.766e+01 1.363e+03 -0.013 0.989658 > ORG_CODE[T.DVA] -1.841e+01 1.363e+03 -0.014 0.989220 > BECLBL08[T.SBS dw 2] -6.733e-01 5.903e-01 -1.141 0.254033 > BECLBL08[T.SBS dw 3] -1.094e+00 5.714e-01 -1.914 0.055586 . > BECLBL08[T.SBS mc 2] 1.573e-01 5.004e-01 0.314 0.753211 > BECLBL08[T.SBS mc 3] 1.402e+00 5.824e-01 2.408 0.016043 * > BECLBL08[T.SBS mk 1] -2.388e+00 7.529e-01 -3.172 0.001514 ** > BECLBL08[T.SBS mw] -1.672e+01 1.393e+03 -0.012 0.990425 > BECLBL08[T.SBS vk] -1.614e+01 1.243e+03 -0.013 0.989640 > BECLBL08[T.SBS wk 1] -3.640e+00 8.174e-01 -4.453 8.48e-06 *** > BECLBL08[T.SBS wk 3] -1.838e+01 1.363e+03 -0.013 0.989240 > PEM_SScat[T.B] -1.815e+01 3.956e+03 -0.005 0.996339 > PEM_SScat[T.C] 1.998e-01 3.925e-01 0.509 0.610792 > PEM_SScat[T.D] -2.314e-01 3.215e-01 -0.720 0.471621 > PEM_SScat[T.E] 5.581e-01 3.433e-01 1.626 0.104020 > PEM_SScat[T.F] -1.113e+00 5.782e-01 -1.926 0.054153 . > PEM_SScat[T.G] 1.780e-01 4.420e-01 0.403 0.687150 > PEM_SScat[T.H] 1.670e+01 3.956e+03 0.004 0.996633 > PEM_SScat[T.I] 2.751e-01 9.313e-01 0.295 0.767705 > PEM_SScat[T.J] -2.623e-01 9.693e-01 -0.271 0.786649 > PEM_SScat[T.K] -1.862e+01 3.956e+03 -0.005 0.996244 > PEM_SScat[T.L] -1.661e+01 1.211e+03 -0.014 0.989056 > SOIL_NUTR[T.C] -1.119e+00 3.781e-01 -2.960 0.003073 ** > SOIL_NUTR[T.D] -7.912e-02 9.049e-01 -0.087 0.930320 > cSEEDSRCE_SW -1.512e-03 4.930e-04 -3.066 0.002170 ** > cMSP 1.808e-02 5.304e-03 3.409 0.000652 *** > ceFFP 2.889e-01 4.662e-02 6.196 5.80e-10 *** > cEXT_Cold -1.880e+00 3.330e-01 -5.647 1.63e-08 *** > > There should be a PEM_Sscat[T.A]. It is the most prevalent occurrence in > this category. > > ORG_CODE is missing more than 6 categories in the list > > SOIL_NUTR should have a [T.B] > > Does that help?
Yes. I don't see a problem however. First, your variables are "factors" which means there will be one fewer coefficients than categories. One level is a reference group which probably explains PEM_Sscat and SOIL_NUTR each "missing" one coefficient. For ORG_CODE, there were 4 DF in the starting model, 4 DF in the final model with 4 coefficients. So the 6 missing categories appear to have been missing from the start. What do you expect for ORG_CODE? What does say summary(ORG_CODE) give you? Are you aware of the dangers of stepwise model fitting? It is a commonly recurring theme on this list. Kevin > -----Original Message----- > From: Kevin E. Thorpe [mailto:[EMAIL PROTECTED] > Sent: Saturday, September 27, 2008 6:21 AM > To: Darin Brooks > Cc: r-help@r-project.org > Subject: Re: [R] logistic regression > > > Darin Brooks wrote: >> Good afternoon >> >> I have what I hope is a simple logistic regression issue. >> >> I started with 44 independent variables and then used the drop1, >> test="chisq" to reduce the list to 8 significant independent variables. >> >> drop1(sep22lr, test="Chisq") and wound up with this model: >> >> Model: MIN_Mstocked ~ ORG_CODE + BECLBL08 + PEM_SScat + SOIL_NUTR + >> cSEEDSRCE_SW + cMSP + ceFFP + cEXT_Cold >> >> 4 of the remaining variables are categorical and 4 are continuous. >> >> However, when I run a glm and then a summary on the glm - some of the >> categorical data is missing from the output. >> >> The PEM_SScat is missing only one variable ... the BECLBL08 is missing >> several variables ... the ORG_CODE is missing 4 .. and the SOIL_NUTR >> is missing 1 variable. >> >> It seems arbitrary to the number of variables missing. Is there >> something wrong with my syntax in calling the logistic model? Am I not > understanding >> the inputs correctly? >> >> Any help would be appreciated. >> > > I'm not sure I fully understand your question. It sounds like you created > your own dummy variables for your categorical variables. Did you? Or did > you use factor variables for your categorical variables? > If the latter, then I REALLY don't understand your question. > > Kevin -- Kevin E. Thorpe Biostatistician/Trialist, Knowledge Translation Program Assistant Professor, Dalla Lana School of Public Health University of Toronto email: [EMAIL PROTECTED] Tel: 416.864.5776 Fax: 416.864.6057 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.