I wish to simulate event times where the censoring is informative, and to compare parameter estimator quality from a Cox PH model with estimates obtained from event times generated with non-informative censoring. However I am struggling to do this, and I conclude rather than a technical flaw in my code I instead do not understand what is meant by informative and un-informative censoring.
My approach is to simulate an event time T dependent on a vector of covariates x having hazard function h(t|x)=lambda*exp(beta'*x)v*t^{v-1}. This corresponds to T~ Weibull(lambda(x),v), where the scale parameter lambda(x)=lambda*exp(beta'*x) depends on x and the shape parameter v is fixed. I have N subjects where T_{i}~ Weibull(lambda(x_{i}),v_{T}), lambda(x_{i})=lambda_{T}*exp(beta_{T}'*x_{i}), for i=1,...,N. Here I assume the regression coefficients are p-dimensional. I generate informative censoring times C_i~ Weibull(lambda(x_i),v_C), lambda(x_i)=lambda_C*exp(beta_C'*x_i) and compute Y_inf_i=min(T_i,C_i) and a censored flag delta_inf_i=1 if Y_inf_i <= C_i (an observed event), and delta_inf_i=0 if Y_inf_i > C_i (informatively censored: event not observed). I am convinced this is informative censoring because as long as beta_T~=0 and beta_C~=0 then for each subject the data generating process for T and C both depend on x. In contrast I generate non-informative censoring times D_i~Weibull(lambda_D*exp(beta_D),v_D), and compute Y_ninf_i=min(T_i,D_i) and a censored flag delta_ninf_i=1 if Y_ninf_i <= D_i (an observed event), and delta_ninf_i=0 if Y_ninf_i > D_i (non-informatively censored: event not observed). Here beta_D is a scalar. I "scale" the simulation by choosing the lambda_T, lambda_C and lambda_D parameters such that on average T_i<C_i and T_i<D_i to achieve X% of censored subjects for both Y_inf_i and Y_ninf_i. The problem is that even for say 30% censoring (which I think is high), the Cox PH parameter estimates using both Y_inf and Y_ninf are unbiased when I expected the estimates using Y_inf to be biased, and I think I see why: however different beta_C is from beta_T, a censored subject can presumably influence the estimation of beta_T only by affecting the set of subjects at risk at any time t, but this does not change the fact that every single Y_inf_i with delta_inf_i=1 will have been generated using beta_T only. Thus I do not see how my simulation can possibly produce biased estimates for beta_T using Y_inf. But then what is informative censoring if not based on this approach? Any help would be greatly appreciated. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.