[R] survexp with large dataframes

Terry Therneau Mon, 03 Oct 2011 07:07:34 -0700

I've re-looked at survexp with the question of efficiency.  As it
stands, the code will have 3-4 (I think it's 4) active copies of the X
matrix at one point; this is likely the reason it takes so much memory
when you have a large data set.
  Some of this is history; key parts of the code were written long
before I understood all the "tricks" for smaller memory in S (Splus or
R), 1 copy is the loss of the COPY= argument when going from Splus to R.


 I can see how to redo it and reduce to 1 copy, but this involves 3 R
functions and 3 C routines.  I'll add it to my list but don't expect
quick results due to a long list in front of it.  It's been a good
summer, but as one of my colleagues put it "No vacation goes
unpunished."

As a mid term suggestion I would use a subsample of your data. With the
data set sizes you describe a 20% subsample will give all the precision
that you need.  Specifically:
   1. Save the results of your current Cox model, call it fit1
   2. Select a subset.
   3. Fit a new Cox model on the subset, with the options
          iter=0, init=fit1$coef
This ensures that the subset has exactly the same coefficients as the
original.
   4. Use survexp on the subset fit.

Terry Therneau

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] survexp with large dataframes

Reply via email to