Hi, Thanks for your response.
On Thu, Nov 14, 2013 at 4:44 PM, Terry Therneau <thern...@mayo.edu> wrote: > I think that your data is censored, not truncated. > For a fault introduced 1/2005 and erased 2/2006, duration = 13 months > For a fault introduced 4/2010 and still in existence at the last > observation 12/2010, duration> 8 months. > For a fault introduced before 2004, erased 3/2005, in a machine installed > 2/1998, the duration is somewhere between 15 and 87 months. > For a fault introduced before 2004, smachine installed 5/2000, still > present 11/2010 at last check, the duration is > 126 months. > > For type=interval2 the data would be (13,13), (8,NA), (15,87), (126, NA). I have done this that way. My problem is that I have no information when a fault is introduced before 2004. Indeed, this is about the lifespan of software faults in the code. In your example, this means I could not set the upper bound to 87 months. As I know for sure that the first software release was in 1994. For a fault which is observed from 2004 up to 2005 I set the range to (12, 120+12). That is 12 observed + 10 years from 1994 to 2004. The estimation is almost similar if I use (12, NA) and gives me an upper bound. I tried (12, 12) to have the lower bound. I tried with 5 years instead of 10. This seems to give an over-estimation too. Could I use some properties of the data from ]2004;2010] to give an average extension to these faults ? The average or median for instance. Thanks in advance. > > Terry T. > > > On 11/14/2013 05:00 AM, r-help-requ...@r-project.org wrote: >> >> Hi, >> >> I would like to know how to handle truncated data. >> My intend is to have the survival curve of a software fault in order >> to have some information >> about fault lifespan. >> >> I have some observations of a software system between 2004 and 2010. >> The system was first released in 1994. >> The event considered is the disappearance of a software fault. The >> faults can have been >> introduced at any time, between 1994 and 2010. But for fault >> introduced before 2004, >> there is not mean to know their age. >> >> I used the Surv and survfit functions with type interval2. >> For the faults that are first observed in 2004, I set the lower bound >> to the lifespan >> observed between 2004 and 2010. >> >> How could I set the upper bound ? Using 1994 as a starting point to not >> seems >> to be meaningful. Neither is using only the lower bound. >> >> Should I consider another survival estimator ? >> >> Thanks in advance. -- Nicolas Palix Tel: +33 4 76 51 46 27 http://membres-liglab.imag.fr/palix/ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.