For time scale that are truly discrete Cox proposed the "exact partial likelihood". I call that the "exact" method and SAS calls it the "discrete" method. What we compute is precisely the same, however they use a clever algorithm which is faster. To make things even more confusing, Prentice introduced an "exact marginal likelihood" which is not implemented in R, but which SAS calls the "exact" method.
Data is usually not truly discrete, however. More often ties are the result of imprecise measurement or grouping. The Efron approximation assumes that the data are actually continuous but we see ties because of this; it also introduces an approximation at one point in the calculation which greatly speeds up the computation; numerically the approximation is very good. In spite of the irrational love that our profession has for anything branded with the word "exact", I currently see no reason to ever use that particular computation in a Cox model. I'm not quite ready to remove the option from coxph, but certainly am not going to devote any effort toward improving that part of the code. The Breslow approximation is less accurate, but is the easiest to program and therefore was the only method in early Cox model programs; it persists as the default in many software packages because of history. Truth be told, unless the number of tied deaths is quite large the difference in results between it and the Efron approx will be trivial. The worst approximation, and the one that can sometimes give seriously strange results, is to artificially remove ties from the data set by adding a random value to each subject's time. Terry T --- begin quote -- I didn't know precisely the specifities of each approximation method. I thus came back to section 3.3 of Therneau and Grambsch, Extending the Cox Model. I think I now see things more clearly. If I have understood correctly, both "discrete" option and "exact" functions assume "true" discrete event times in a model approximating the Cox model. Cox partial likelihood cannot be exactly maximized, or even written, when there are some ties, am I right ? In my sample, many of the ties (those whithin a single observation of the process) are due to the fact that continuous event times are grouped into intervals. So I think the logistic approximation may not be the best for my problem despite the estimate on my real data set (shown on my previous post) do give interessant results regarding to the context of my data set ! I was thinking about distributing the events uniformly in each interval. What do you think about this option ? Can I expect a better approximation than directly applying Breslow or Efron method directly with the grouped event data ? Finally, it becomes a model problem more than a computationnal or algorithmic one I guess. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.