Dear R users, I noticed a problem in the anova command when applied on a single coxph object if there are missing observations in the data:
This example code was run on R-2.6.1: > library(survival) > data(colon) > colondeath = colon[colon$etype==2, ] > m = coxph(Surv(time, status) ~ rx + sex + age + perfor, data=colondeath) > m Call: coxph(formula = Surv(time, status) ~ rx + sex + age + perfor, data = colondeath) coef exp(coef) se(coef) z p rxLev -0.028895 0.972 0.11037 -0.262 0.7900 rxLev+5FU -0.374286 0.688 0.11885 -3.149 0.0016 sex -0.000754 0.999 0.09431 -0.008 0.9900 age 0.002442 1.002 0.00405 0.603 0.5500 perfor 0.155695 1.168 0.26286 0.592 0.5500 Likelihood ratio test=12.8 on 5 df, p=0.0251 n= 929 > anova(m, test='Chisq') Analysis of Deviance Table Cox model: response is Surv(time, status) Terms added sequentially (first to last) Df Deviance Resid. Df Resid. Dev P(>|Chi|) NULL 929 5860.4 rx 2 12.1 927 5848.2 2.302e-03 sex 1 2.054e-05 926 5848.2 1.0 age 1 0.3 925 5847.9 0.6 perfor 1 0.3 924 5847.6 0.6 Now I include nodes which has some missing data: > m = coxph(Surv(time, status) ~ rx + sex + age + perfor + nodes, data=colondeath) > m Call: coxph(formula = Surv(time, status) ~ rx + sex + age + perfor + nodes, data = colondeath) coef exp(coef) se(coef) z p rxLev -0.08245 0.921 0.11168 -0.738 0.46000 rxLev+5FU -0.40310 0.668 0.12054 -3.344 0.00083 sex -0.02854 0.972 0.09573 -0.298 0.77000 age 0.00547 1.005 0.00405 1.350 0.18000 perfor 0.19040 1.210 0.26335 0.723 0.47000 nodes 0.09296 1.097 0.00889 10.460 0.00000 Likelihood ratio test=88.3 on 6 df, p=1.11e-16 n=911 (18 observations deleted due to missingness) > anova(m, test='Chisq') Analysis of Deviance Table Cox model: response is Surv(time, status) Terms added sequentially (first to last) Df Deviance Resid. Df Resid. Dev P(>|Chi|) NULL 911 5700.6 rx 2 0.0 909 5848.2 1.0 sex 1 2.054e-05 908 5848.2 1.0 age 1 0.3 907 5847.9 0.6 perfor 1 0.3 906 5847.6 0.6 nodes 1 235.3 905 5612.3 4.253e-53 The strange thing is that rx is not significant anymore. In the documentation for anova.coxph, there is a warning that > The comparison between two or more models by |anova| or will only be > valid if they are fitted to the same dataset. This may be a problem if > there are missing values. > However, I inserted a single object to be analyzed sequentially. Is this a bug in R, or is it covered by the warning? Best wishes, Matthias ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.