Re: [R] survival analysis: interval censored data

David Winsemius Wed, 28 Sep 2011 13:35:12 -0700


On Sep 28, 2011, at 10:56 AM, Ruth Arias wrote:



hallo terry:

I attached araceae data set,

The usual survival analysis via the Kaplan-Meier method only makeestimates at the time of events. When you tabulate your data, you seethat there were no events for the missing (starting) "time" rows inthose categories during the intervals that you are questioning asmissing:


xtabs( ~ time+time2+categoria+event, data=araceae)
, , categoria = C, event = 0

      time2
time   2005 2006 2007 2008 2009 2010
  2004    0   23    1    3    1   22
  2005    0    0    0    0    0    0
  2007    0    0    0    0    4   19
  2008    0    0    0    0    0    0
  2009    0    0    0    0    0    0

, , categoria = E, event = 0

      time2
time   2005 2006 2007 2008 2009 2010
  2004    0   22    0    7    3   21
  2005    0    0    1    1    0    0
  2007    0    0    0    0    0   29
  2008    0    0    0    0    0    0
  2009    0    0    0    0    0    1


, , categoria = C, event = 1

      time2
time   2005 2006 2007 2008 2009 2010
  2004    0    5    2    3    0    3
  2005    0    0    0    0    0    0
  2007    0    0    0    2    3    2
  2008    0    0    0    0    1    0
  2009    0    0    0    0    0    0

, , categoria = E, event = 1

      time2
time   2005 2006 2007 2008 2009 2010
  2004    7    2    1    1    3    4
  2005    0    0    0    1    0    0
  2007    0    0    0    3    1    3
  2008    0    0    0    0    0    0
  2009    0    0    0    0    0    0

when I use this:

surara<-survfit(Surv(time,time2,event)~categoria)

Call: survfit(formula = Surv(time, time2, event) ~ categoria)

            records n.max n.start events median 0.95LCL 0.95UCL
categoria=C      94    63       0     21     NA      NA      NA
categoria=E     111    77       0     26     NA      NA      NA
summary(surara)
Call: survfit(formula = Surv(time, time2, event) ~ categoria)

                categoria=C
time n.risk n.event entered censored survival std.err lower 95% CIupper 95% CI2006 63 5 0 23 0.921 0.03410.856 0.9902007 35 2 30 1 0.868 0.04830.778 0.9682008 62 5 1 3 0.798 0.05360.700 0.9102009 55 4 0 5 0.740 0.05700.636 0.8612010 46 5 0 41 0.660 0.06110.550 0.791
                categoria=E
time n.risk n.event entered censored survival std.err lower 95% CIupper 95% CI2005 71 7 3 0 0.901 0.03540.835 0.9732006 67 2 0 22 0.875 0.03910.801 0.9552007 43 1 36 1 0.854 0.04320.774 0.9432008 77 5 0 8 0.799 0.04690.712 0.8962009 64 4 1 3 0.749 0.05020.657 0.8542010 58 7 0 51 0.658 0.05450.560 0.774

You see that your first survfit object is offering a simple sum of'time2' columns of that tabulation as its 'n.event' values. It's'n.risk' tabulation is not taking note of whether a case started inany particular prior interval. The n.risk sum appears to be the sum ofpersons surviving from the prior year less any decedents plus anyentrants as reflected in "future" events on that row You noticethat there are missing years even in that report: 2004,2005 forcategory C and 2004 for category E since there are no events incolumns for those 'time2' values.


but whe I included type=interval,

suraraint<-survfit(Surv(time,time2,event,type='interval')~categoria) # faltaarreglar lo del intervalo!!!
summary(suraraint)

Call: survfit(formula = Surv(time, time2, event, type = "interval") ~
    categoria)

                categoria=C
 time n.risk n.event survival std.err lower 95% CI upper 95% CI
 2004  95.00   13.14    0.862  0.0354        0.795        0.934
 2007  31.86    7.19    0.667  0.0695        0.544        0.818
 2008   1.67    1.67    0.000     NaN           NA           NA

                categoria=E
 time n.risk n.event survival std.err lower 95% CI upper 95% CI
 2004  112.0   18.47    0.835  0.0351        0.769        0.907
 2005   40.5    1.06    0.813  0.0401        0.738        0.896
 2007   37.5    7.46    0.651  0.0620        0.540        0.785

The second object's n.event, when Surv() was constructed withtype="interval", has values based on the starting 'time' rows, but Iam unable to deduce the estimating algorithm. I remember Therneausaying it wasn't a simple algorithm. The 2008 row in category C hasone entry of 1 in the next year and there were no censoring for C-entrants in that year. Why the n.event is 1.67 I cannot say, but atleast the n.event does not exceed the n.risk. The code or a copy ofTherneau and Grambsch would be sensible places to look for answer bymy initial efforts in those direction have not illuminated me.


--
David.


it does not survival calculed for very year

I have a one-year interval between each census



________________________________
De: Terry Therneau <thern...@mayo.edu>
Para: Ruth Arias <rueu...@yahoo.es>
CC: r-help@r-project.org
Enviado: miércoles 28 de septiembre de 2011 16:00
Asunto: Re:  survival analysis: interval censored data

You have still not given me enough information to reproduce your

problem. "Why doesn't it include all years?" I have no way ofknowing,

since we have no data.

--- begin included message --
halo david

when I use type= 'interval'

Call: survfit(formula = Surv(ingreso, fecha, estado, type ="interval")

~
    categoria)

and when I use just

Call: survfit(formula = Surv(ingreso, fecha, estado) ~ categoria)

I don t know why when I use type = "interval" it does not survival
calculed for very year


regards<araceae.txt>______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] survival analysis: interval censored data

Reply via email to