[R] Using coxph with Gompertz-distributed survival data.

Alex F. Bokov Thu, 04 Feb 2010 18:44:42 -0800

Dear list:

I am attempting to use what I thought would be a pretty straightforward 
practical application of Cox regression. I figure users of the survival package 
must have come across this problem before, so I would like to ask you how you 
dealt with it. I have set up an illustrative example and included it at the end 
of this post.


I took a sample of 100 data points from each of two populations ('a' and 'b') 
known to have Gompertz distributions that differ in both their lambda (initial 
rate, which is higher in population 'a') and gamma (acceleration rate, which is 
higher in population 'b') parameters. Each data point represents the age at 
which an experimental animal died.

First question: given the way the parameters differ between the Gompertz 
distributions, the mortality rate will be higher for 'a' younger ages, higher 
for 'b' at older ages, and the assumption of the Cox Proportional Hazards model 
is violated a priori, isn't it? cox.zph confirms this. Yet I found plenty of 
Gompertz parameter values that differ, and lead to differences in survival 
times detectable by coxph, yet pass the cox.zph test. Should I assume that 
cox.zph is insufficiently sensitive and take measures to account for 
non-proportionality of hazards anytime I know I'm looking at longevity data 
(which is believed to have a Gompertz distribution for mammals dying from 'old 
age')?

Second question: supposedly the way to handle such data with coxph is to either 
stratify it by age or include an interaction term with age in the formula. The 
first option is not optimal, because the hazard for a Gompertz distribution 
changes continuously and there are no natural intervals within which the hazard 
ratio is relatively constant. Therefore, I would like to use the second option, 
but including the age term in the formula produces an error that says "Error in 
fitter(X, Y, strats, offset, init, control, weights = weights,  :  NA/NaN/Inf 
in foreign function call (arg 6)". My question is, what would be the correct 
way to include age in the formula?

If you have read this far, thank you kindly for your time. Below is code for 
reproducing the example I explained above.

# create the data
a<-c(1048, 773, 1072, 753, 888, 1038, 852, 950, 1335, 1227, 904, 634, 980, 
1075, 1308, 787, 1079, 1151, 677, 1110, 363, 936, 774, 644, 1080, 1055, 1119, 
975, 941, 1148, 1014, 541, 1297, 584, 847, 1136, 793, 1171, 985, 934, 852, 550, 
944, 1165, 1190, 99, 1227, 1052, 1106, 888, 884, 555, 862, 1241, 841, 987, 
1162, 1028, 1009, 1102, 1082, 1009, 1200, 902, 1215, 1121, 1177, 882, 766, 
1366, 1190, 755, 958, 38, 925, 1131, 1031, 639, 704, 1097, 820, 570, 1029, 
1205, 1284, 1139, 522, 1197, 1268, 1376, 842, 1310, 959, 1160, 824, 652, 1121, 
1262, 1006, 1021);

b<-c(797, 897, 993, 1174, 998, 1117, 583, 698, 1125, 796, 1055, 953, 847, 892, 
875, 1108, 1090, 1101, 1104, 991, 882, 1036, 889, 843, 1129, 1018, 936, 1035, 
906, 1192, 1028, 1109, 790, 897, 544, 978, 1108, 1004, 730, 1093, 845, 1014, 
1040, 1080, 921, 958, 971, 950, 1026, 882, 1055, 703, 901, 991, 863, 1043, 999, 
784, 908, 987, 1040, 760, 887, 1028, 808, 1077, 812, 843, 1002, 639, 905, 808, 
850, 1112, 736, 851, 1008, 990, 516, 1015, 942, 993, 1127, 959, 963, 1069, 940, 
815, 926, 1005, 983, 1093, 898, 901, 1132, 1011, 808, 905, 1129, 840);

# format the data for coxph
age<-c(a,b); group<-as.factor(c('a','b'),c(100,100)));

# run coxph; for simplicity, the data is uncensored
ab.cox<-coxph(Surv(age)~group);

# the results indicate a singificant difference between the groups
ab.cox;

# however, it appears that the hazards are not proportional and the result is 
not interpretable
cox.zph(ab.cox);

# including age in an interaction term leads to an error
ab.int.cox<-coxph(Surv(age)~group+group:age);

# the following is like above; 'c' is from a population differs from 'a' in 
both parameters
# and coxph indicates a highly significant difference, but this time the data 
seems to not
# violate the proportional hazards assumption

c<-c(526, 858, 36, 814, 402, 902, 653, 779, 776, 971, 789, 824, 680, 1017, 957, 
607, 1142, 1144, 888, 861, 868, 1052, 825, 654, 443, 1023, 727, 612, 1017, 888, 
811, 820, 780, 846, 844, 272, 1083, 855, 903, 783, 1036, 1026, 462, 1007, 777, 
1124, 1077, 1041, 900, 963, 332, 740, 1012, 1017, 705, 864, 1053, 643, 1195, 
795, 1040, 792, 714, 971, 832, 558, 1107, 1038, 1149, 448, 605, 921, 772, 994, 
1034, 949, 788, 1013, 788, 1038, 956, 850, 1045, 747, 967, 814, 661, 657, 730, 
966, 676, 344, 743, 643, 1004, 766, 942, 1040, 1067, 409);

age2<-c(a,c); group2<-as.factor(rep(c('a','c'),c(100,100)));
ac.cox<-coxph(Surv(age2)~group2);
ac.cox;
cox.zph(ac.cox);

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Using coxph with Gompertz-distributed survival data.

Reply via email to