Dear R users, Hopefully someone can help me, Maybe I just misunderstand the function in the package? I am working with a logistic regression model. Until now I always worked with the basic glm function, where for the model was: ¡§ glm( disease ~ test.value + cnct , family=binomial(link=¡¦logit¡¦) ¡¨.
This works fine when test .value and concentration (cnct) are continuous vairables. However, concentration is in fact a grouping variable over 5 experiments with 5 concentrations ( 25, 50, 100, 200 & 400). Therefore I believe concentration to be an ordered factor ( in model : cnct_o). To make this model I used the ¡§rms¡¨ library (previously known as Design) and functions lrm (or Glm). The lrm (or Glm) returns the odds for disease, the ¡§inv.logit (odds) ¡¨ gives the probability of disease, but I have to do this with the Predict function of the ¡§rms¡¨ package. ##### The resulting model (with lrm or Glm) would be : Coef S.E. Wald Z Pr(>|Z|) Intercept 23.800 0.8891 2.68 0.0074 test.value 20.806 0.3409 6.10 <0.0001 cnct_o -0.1127 0.0268 -4.21 <0.0001 cnct_o=100 77.393 17.542 4.41 <0.0001 cnct_o=200 204.291 45.080 4.53 <0.0001 cnct_o=400 427.829 98.180 4.36 <0.0001 ##### The results of the standard glm function are very different : ##### Standard glm Deviance Residuals: Min 1Q Median 3Q Max -2.7361 -0.2750 0.2177 0.5143 1.6897 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -0.9022 0.3370 -2.677 0.007427 ** test.value 2.0806 0.3409 6.103 1.04e-09 *** cnct_o.L 1.4363 0.4722 3.042 0.002352 ** cnct_o.Q 1.2208 0.4934 2.474 0.013359 * cnct_o.C -2.0649 0.5610 -3.681 0.000232 *** cnct_o^4 0.5599 0.4760 1.176 0.239485 --- Signif. codes: 0 ¡¥***¡¦ 0.001 ¡¥**¡¦ 0.01 ¡¥*¡¦ 0.05 ¡¥.¡¦ 0.1 ¡¥ ¡¦ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 252.32 on 220 degrees of freedom Residual deviance: 167.68 on 215 degrees of freedom AIC: 179.68 Number of Fisher Scoring iterations: 6 ##### As I use the model parameters of the standard glm model to calculate the odds and probability manualy, I believe cnct_o = 25 to be the reference category and that cnct_o.L = 50, cnct_o.Q=100, cnct_o.C = 200 and cnct_o^4= 400. But I am not sure of this. The formulas used are : Odds <- intercept + slope * test.value + cnct_o , where cnct_o is the corresponding value for the given concentration. Probability <- inv.logit ( Odds ), the function inv.logit from package ¡§car¡¨. The results of the glm are in the table below, which are first the odds and then the probability¡¦s ( inv.logit (odds)). ##### glm odds cntc: test.value 25 cnct_o.L cnct_o.Q cnct_o.C cnct_o^4 0 -0.90218 0.534169 0.318649 -2.96706 -0.34231 0.5 0.138132 1.574477 1.358957 -1.92675 0.698001 1 1.17844 2.614785 2.399265 -0.88644 1.738309 1.5 2.218747 3.655093 3.439572 0.153864 2.778616 2 3.259055 4.6954 4.47988 1.194172 3.818924 ##### glm prob. cntc: test.value 25 cnct_o.L cnct_o.Q cnct_o.C cnct_o^4 0 0.288604 0.630455 0.578995 0.048936 0.415249 0.5 0.534478 0.828421 0.79559 0.127111 0.667744 1 0.764667 0.931807 0.916771 0.291844 0.850472 1.5 0.90192 0.974793 0.968919 0.53839 0.941509 2 0.962997 0.990946 0.988792 0.767486 0.97852 ##### If I compare this with the result of the Predict function in rms, the results seem very different, it can be because I misinterpret the glm model parameters for the ordered factor. How can I be sure which model parameter corresponds to which factor in the standard glm. ##### Results of lrm: Predict.lrm cntc: test.value 25 50 100 200 400 0 -0,43815 -3,25628 -1,15323 0,264036 0,072751 0,50580154 0,614227 -2,2039 -0,10085 1,316414 1,125129 1,01160308 1,666606 -1,15153 0,951526 2,368793 2,177508 1,505068 2,693316 -0,12482 1,978236 3,395503 3,204218 2,01086954 3,745695 0,927563 3,030615 4,447882 4,256597 ##### Prob. (inv.logit(odds)) cntc: test.value 25 50 100 200 400 0 0,392182 0,037102 0,239899 0,565628 0,51818 0,50580154 0,648904 0,0994 0,474808 0,788585 0,754939 1,01160308 0,841123 0,24021 0,721422 0,914416 0,898211 1,505068 0,936631 0,468837 0,878493 0,967564 0,960993 2,01086954 0,976926 0,716581 0,953938 0,988432 0,986028 ##### One more problem occurs when I calculate the odds and probability manually, using the formulas as with the standard glm I get the following (very different) results: man.lrm.odds cntc: test.value 25 50 100 200 400 0 2,37998 2,267255 10,11929 22,80909 45,16286 0,5 3,420288 3,307563 11,1596 23,8494 46,20316 1 4,460596 4,34787 12,19991 24,8897 47,24347 1,5 5,500903 5,388178 13,24022 25,93001 48,28378 2 6,541211 6,428486 14,28053 26,97032 49,32409 man.lrm.prob test.value 25 50 100 200 400 0 0,915288 0,906129 0,99996 1 1 0,5 0,968333 0,964687 0,999986 1 1 1 0,988577 0,987231 0,999995 1 1 1,5 0,995934 0,995451 0,999998 1 1 2 0,998559 0,998388 0,999999 1 1 ##### It is clear to me that I am missing something essential about this lrm function. Question: Are the model parameters for ordered factors converted in some way by from the lrm and Glm function? » Which could expain why I get a different result for the Predict function, then when I calculate it manually. Question: Is it correct that the first level of the ordered factor is used as reference category by the standard glm function? It there a way of giving them a different name, as it does automatically for the nominal factor? Thank you for reading my questions, And please share any insight that you have. Kind regards, Tom Disclaimer: click here [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.