Terry, I figured out that variance of log(-log(S)) should be (1/H^2)var(H), not (1/S^2)var(H)!
Thanks John ________________________________ e...@mayo.edu>; "r-help@r-project.org" <r-help@r-project.org> Sent: Monday, July 21, 2014 11:41 AM Subject: Re: standard error of survfit.coxph() Dear Terry, I was trying to use your explanation of the standard error estimate from survfit.coxph() to verify the standard error estimates for the method of log(log(S)), but couldn't get the estimates correct. Here is an example using the lung dataset: > fit<-coxph(Surv(time,status)~wt.loss,lung) > surv<-survfit(fit,newdata=lung[2:5,])$surv > logstd<-survfit(fit,newdata=lung[2:5,])$std.err > logstd.time<-survfit(fit,newdata=lung[2:5,])$time ## std error of cumulative hazard at time=60 > logstd<-logstd[logstd.time==60,] > logstd [1] 0.01965858 0.01965858 0.01941871 0.01969248 ## survival S at months 60 > surv<-surv[logstd.time==60,] > surv 2 3 4 5 0.9249238 0.9249238 0.9253038 0.9263392 Since survfit()$std.err was for cumulative hazard H=log(S), thus based on your explanation below, the standard error for log-log method is: (1/S^2)var(H) > loglogstd<-sqrt(1/surv^2*(logstd^2)) > loglogstd 2 3 4 5 0.02125427 0.02125427 0.02098631 0.02125839 ## the upper confidence interval should be > exp(-exp(log(-log(surv))-qnorm(0.975)*loglogstd)) 2 3 4 5 0.9278737 0.9278737 0.9282031 0.9292363 But this is different from the output using summary.survfit: > summary(survfit(fit,newdata=lung[2:5,],conf.type='log-log'),times=60)$upper[1,] 2 3 4 5 0.9534814 0.9534814 0.9535646 0.9548478 Can you please suggest what I did wrong in the calculation? Thanks very much! John ________________________________ To: "Therneau, Terry M., Ph.D." <thern...@mayo.edu>; "r-help@r-project.org" <r-help@r-project.org> Sent: Monday, June 30, 2014 10:46 AM Subject: Re: standard error of survfit.coxph() [[elided Yahoo spam]] John ________________________________ From: "Therneau, Terry M., Ph.D." <thern...@mayo.edu> Sent: Monday, June 30, 2014 6:04 AM Subject: Re: standard error of survfit.coxph() 1. The computations "behind the scenes" produce the variance of the cumulative hazard. This is true for both an ordinary Kaplan-Meier and a Cox model. Transformations to other scales are done using simple Taylor series. H = cumulative hazard = log(S); S=survival var(H) = var(log(S)) = the starting point S = exp(log(S)), so var(S) is approx [deriv of exp(x)]^2 * var(log(S)) = S^2 var(H) var(log(log(S)) is approx (1/S^2) var(H) 2. At the time it was written, summary.survfit was used only for printing out the survival curve at selected times, and the audience for the printout wanted std(S). True, that was 20 years ago, but I don't recall anyone ever asking for summary to do anything else. Your request is not a bad idea. Note however that the primary impact of using log(S) or S or log(log(S)) scale is is on the confidence intervals, and they do appear per request in the summary output. Terry T. On 06/28/2014 05:00 AM, r-help-requ...@r-project.org wrote: > Message: 9 > Date: Fri, 27 Jun 2014 12:39:29 -0700 > To:"r-help@r-project.org" <r-help@r-project.org> > Subject: [R] standard error of survfit.coxph() > Message-ID: > <1403897969.91269.yahoomail...@web122906.mail.ne1.yahoo.com> > Content-Type: text/plain > > Hi, can anyone help me to understand the standard errors printed in the > output of survfit.coxph()? > > time<-sample(1:15,100,replace=T) > > status<-as.numeric(runif(100,0,1)<0.2) > x<-rnorm(100,10,2) > > fit<-coxph(Surv(time,status)~x) > ??? ### method 1 > > survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], > conf.type='log')$std.err > > ???????????? [,1]??????? [,2]??????? [,3]??????? [,4]?????? [,5] > ?[1,] 0.000000000 0.000000000 0.000000000 0.000000000 0.00000000 > ?[2,] 0.008627644 0.008567253 0.008773699 0.009354788 0.01481819 > ?[3,] 0.008627644 0.008567253 0.008773699 0.009354788 0.01481819 > ?[4,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371 > ?[5,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371 > ?[6,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371 > ?[7,] 0.030226811 0.030423883 0.029806263 0.028918817 0.05191161 > ?[8,] 0.030226811 0.030423883 0.029806263 0.028918817 0.05191161 > ?[9,] 0.036852571 0.037159980 0.036186931 0.034645002 0.06485394 > [10,] 0.044181716 0.044621159 0.043221145 0.040872939 0.07931028 > [11,] 0.044181716 0.044621159 0.043221145 0.040872939 0.07931028 > [12,] 0.055452631 0.056018832 0.054236881 0.051586391 0.10800413 > [13,] 0.070665160 0.071363749 0.069208056 0.066655730 0.14976433 > [14,] 0.124140400 0.125564637 0.121281571 0.118002021 0.30971860 > [15,] 0.173132357 0.175309455 0.168821266 0.164860523 0.46393111 > > survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], > conf.type='log')$time > ?[1]? 1? 2? 3? 4? 5? 6? 7? 8? 9 10 11 12 13 14 15 > > ??? ### method 2 > > summary(survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], > conf.type='log'),time=10)$std.err > > ????????????? 1????????? 2????????? 3????????? 4????????? 5 > [1,] 0.04061384 0.04106186 0.03963184 0.03715246 0.06867532 > > By reading the help of ?survfit.object and ?summary.survfit, the standard > error provided in the output of method 1 (survfit()) was for cumulative > hazard-log(survival), while the standard error provided in the output of > method 2 (summary.survfit()) was for survival itself, regardless of how you > choose the value for "conf.type" ('log', 'log-log' or 'plain'). This explains why the standard error output is different between method 1 (10th row) and method 2. > > My question is how do I get standard error estimates for log(-log(survival))? > > Thanks! > > John [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.