Dear Terry, Eleni & Heinz, Please forgive me for imposing on your time, but since we have had similar discussions before on this topic, I thought that it would be nice to get your feedback on this problem.
I have written a function to compute the cumulative incidence, for a given covariate vector Z, based on Cox PH models for each competing event type. It is based on Cheng, Fine, and Wei (Biometrics 1998), but is fairly straightforward. Fine calls it "indirect regression" approach, since it models each cause-specific hazard and then puts it all together using the basic definition of CIF. I applied it to the Green & Byar data on DES trial for prostate cancer (this data set was also discussed in Kay 1986 and Cheng et al. 1998). The data set is attached as STATA file, which can be read into R using "foreign" package. My function for CIF seems to do ok. However, I am not sure about this. I computed the CIFs of prostate cancer for two sets of covariates (see the code). I compared the indirect CIFs to the "direct regression " CIFs predicted by Fine & Gray's method. There appears to be some significant differences, especially for CVD deaths. The CIF plots for the 3 competing events can be easily generated by runnng the code. Why would there be a big difference between FG direct approach and the indirect CIF computed by my code? Is this because of the modeling assumptions in the direct regression approach of FG? Or is there some problem with my approach? I would appreciate any comments. Thank you, Ravi. ---------------------------------------------------------------------------- ------- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: rvarad...@jhmi.edu Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h tml ---------------------------------------------------------------------------- -------- -----Original Message----- From: Eleni Rapsomaniki [mailto:er...@medschl.cam.ac.uk] Sent: Friday, March 27, 2009 10:44 AM To: Terry Therneau; tuech...@gmx.at; Ravi Varadhan Cc: r-help@r-project.org Subject: RE: RE: Competing risks Kalbfleisch & Prentice method Dear Prof. Therneau, Thank you for your views on this subject. I think all R users who play with survival analysis are most grateful for the functions you have already supplied us with. I'm guessing Ravi is wondering why you have not implemented the smoothing of the baseline hazard from the Cox model. I actually tried to do this originally, inspired from this thread (i.e use sm.spline to smooth the hazard): https://stat.ethz.ch/pipermail/r-help/2004-July/053843.html but it overestimated the CI (perhaps I implemented it wrong). I was then advised to treat CI as a step function, rather than continuous, which means that F(t+1, cause k)-F(t, cause k) will be 0 unless an event of cause k has occurred in that interval (see also "Competing Risks, by Melanie Pintilie, page 62). This is obviously problematic if one wants to estimate the CI at times that are not close to observed events for either cause (perhaps a parametric model could be used in this case). But then again, this was not an issue wtih my data. Eleni Rapsomaniki Research Associate Strangeways Research Laboratory Department of Public Health and Primary Care University of Cambridge -----Original Message----- From: Terry Therneau [mailto:thern...@mayo.edu] Sent: 27 March 2009 13:53 To: Eleni Rapsomaniki; tuech...@gmx.at; Ravi Varadhan Cc: r-help@r-project.org Subject: RE: Competing risks Kalbfleisch & Prentice method Ravi's last note finished with > I am wondering why Terry Therneau's "survival" package doesn't have > this option. The short answer is that there are only so many hours in a day. I've recently moved the code base from an internal Mayo repository to R-forge, one long term goal with this is to broaden the developer base to n>2 (me and Thomas Lumley). A longer statistical answer: I'm not sure if the "this" of Ravi's question is a. smoothed hazards, b. the K&P cumulative incidence or c. the Fine & Gray model. b. I like the CI model and am using it more. We also have local code. The latest version of survival (on rforge, likely in the next default R release) has added simple CI curves to the survfit function. Adding code for survfit on Cox models is on the todo list. But -- this release also fixes up survfit.coxph to handle weighted Cox models and that was on my list for approx 10 years, i.e., don't hold your breath. I don't release something until it also has a set of worked out test cases to add to the 'tests' directory. a. smoothed hazards. For the case at hand I don't see any particular advantage of this. On the other hand, I often would like to display hazard functions instead of CI functions for Cox models; with time dependent covariates I don't think a survival curve makes sense. But I haven't had the time to think through exactly which methods should be added. c. Fine & Gray model, i.e., where covariates have a direct influence on the competing risk. I find the model completely untenable from a biologic point of view, so have no interest in adding it. (Due to finite time, everything in the survival package is code that I needed for an analysis; medical research is what pays my salary.) Assume that I have competing processes/risks, say progression of a tumor and heart disease; I expect that the tumor process pays no attention whatsoever to what is going on in the heart. But this is necessary if "type=squamous" is modeled as an absolute beta=__ increase in the CI for cancer. The squamous cells need to "step up the pace" of invasion if heart failure threatens, like jockeys in a horse race. Terry T.
require(foreign) require(survival) require(cmprsk) ################################################################ # Green & Byar data set gb <- read.dta("h:/competing risks/gb.dta") # N = 483 attach(gb) ################################# # Cox model fits for cause-specific hazards fit1 <- coxph( Surv(time, status=="Cancer") ~ treatment + ag + hx + sz + sg, method="breslow", data=gb) fit2 <- coxph( Surv(time, status=="CVD") ~ treatment + ag + hx + sz + sg, method="breslow", data=gb) fit3 <- coxph( Surv(time, status=="Other") ~ treatment + ag + hx + sz + sg, method="breslow", data=gb) fit <- coxph( Surv(time, status=="Cancer" | status=="CVD" | status=="Other") ~ treatment + ag + hx + sz + sg, method="breslow", data=gb) fits <- list(fit1, fit2, fit3, fit) ########################################################## cif <- function(fits, etype=1, z) { # Prediction of cumulative incidence curves (Cheng, Fine, Wei; Biometrics 1998) # Indirect regression method nfits <- length(fits) cif <- NULL # if z is missing mean covariate value is used if (missing(z)) { zmiss <- TRUE z <- matrix(NA, 1, 1) } else { zmiss <- FALSE if (is.null(dim(z))) z <- matrix(z, nrow=1, ncol=length(z)) } for (i in 1:nrow(z)) { sfit.e <- if(!zmiss) survfit(fits[[etype]], type="aalen", newdata=z[i, ]) else survfit(fits[[etype]], type="aalen") surv <- sfit.e$surv time <- sfit.e$time dH <- c(-log(surv)[1], diff(-log(surv) )) sfit.o <- if(!zmiss) survfit(fits[[nfits]], type="aalen", newdata=z[i, ]) else survfit(fits[[nfits]], type="aalen") Osurv <- stepfun(x=sfit.o$time, y=c(1, sfit.o$surv)) cif <- cbind(cif, cumsum(Osurv(time) * dH)) } if(!zmiss) colnames(cif) <- paste("cif.z", 1:nrow(z), sep="") else colnames(cif) <- "cif.zmean" data.frame(time = time, cif) } ###################### # two sets of covariates for which CIFs are to be predicted z1 <- data.frame(treatment=0, ag=0, hx=1, sz=0, sg=0) z2 <- data.frame(treatment=1, ag=0, hx=1, sz=0, sg=0) model.mat <- model.matrix( ~ time + as.numeric(status) + treatment + as.numeric(ag) + hx + sz + sg, data=gb) # Comparison with Fine & Gray's direct regression # prostate cancer cr1 <- cif(fits, z=rbind(z1, z2), etype=1) # Indirect regression plot(cr1[,1], cr1[,2], type="o", xlab="Time", ylab="Cumulative incidence of Prostate cancer deaths", ylim=c(0, 0.6), main="Comparison of indirect and direct CIF prediction") lines(cr1[,1], cr1[,3],col=2, type="o") cif.cancer <- crr(ftime=model.mat[, 2], fstatus=model.mat[, 3], cov1=model.mat[, 4:8], failcode=2, maxiter=50) # Fine-Gray z.cancer <- predict(cif.cancer, cov1=rbind(unlist(z1), unlist(z2))) lines(z.cancer[,1], z.cancer[,2], col=3) lines(z.cancer[,1], z.cancer[,3], col=4) legend(locator(1), legend=c("z1 (indirect)", "z2 (indirect)", "z1 (FG)", "z2 (FG)"), lty=rep(1, 4), col=c(1,2, 3,4)) # CVD # The two methods give quite different estimates of CIF here cr2 <- cif(fits, z=rbind(z1, z2), etype=2) plot(cr2[,1], cr2[,2], type="o", xlab="Time", ylab="Cumulative incidence of CVD deaths", ylim=c(0, 0.6), main="Comparison of indirect and direct CIF prediction") lines(cr2[,1], cr2[,3],col=2, type="o") cif.cvd <- crr(ftime=model.mat[, 2], fstatus=model.mat[, 3], cov1=model.mat[, 4:8], failcode=3, maxiter=50) z.cvd <- predict(cif.cvd, cov1=rbind(unlist(z1), unlist(z2))) lines(z.cvd[,1], z.cvd[,2], col=3) lines(z.cvd[,1], z.cvd[,3], col=4) legend(locator(1), legend=c("z1 (indirect)", "z2 (indirect)", "z1 (FG)", "z2 (FG)"), lty=rep(1, 4), col=c(1,2, 3,4)) # Other cr3 <- cif(fits, z=rbind(z1, z2), etype=3) plot(cr3[,1], cr3[,2], type="o", xlab="Time", ylab="Cumulative incidence of Other deaths", ylim=c(0, 0.6), main="Comparison of indirect and direct CIF prediction") lines(cr3[,1], cr3[,3],col=2, type="o") cif.oth <- crr(ftime=model.mat[, 2], fstatus=model.mat[, 3], cov1=model.mat[, 4:8], failcode=4, maxiter=50) z.oth <- predict(cif.oth, cov1=rbind(unlist(z1), unlist(z2))) lines(z.oth[,1], z.oth[,2], col=3) lines(z.oth[,1], z.oth[,3], col=4) legend(locator(1), legend=c("z1 (indirect)", "z2 (indirect)", "z1 (FG)", "z2 (FG)"), lty=rep(1, 4), col=c(1,2, 3,4))
CIF_cvd.pdf
Description: Adobe PDF document
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.