Hi Dennis,
Thanks for your feedback.
I do apologise, I omitted the lines of code that created the censoring 
variable, I was focused more on altering the actual
outcome times. So it is actually more like this,

if (outcomeType == 'Weibull'){
 Y[i] <- rweibull(1, shape, shape))               
}
if (Y[i] >  censorT){
Y[i] <- censorT
death[i] <- 0}
else
{Y[i]  <- Y[i]
death[i] <- 1
}
        }

Thankyou for the tip about the random seed.
Lucy



-----Original Message-----
From: Dennis Murphy [mailto:djmu...@gmail.com] 
Sent: Monday, 24 March 2014 4:12 PM
To: Lucy Leigh
Subject: Re: [R] Simulating from the Weibull with right censoring

Hi:

I don't think you're going about this the right way. It appears to me that your 
(unnecessary) loop is performing truncation instead, and there is a sharp 
distinction in the survival literature between censoring and truncation.

First of all, let's introduce you to the concept of vectorized calculations in 
R. To reproduce your code (with a random seed because you didn't set one), you 
could do something like this:

set.seed(5489)    # set a random seed for reproducibility of results
nSubjects <- 1000

# change these to the values you're using shape <- 1 scale <- 2

# Simulate nSubjects values from a Weibull(shape, scale) dist'n Y <- 
rweibull(nSubjects, shape, scale)

# Plot a histogram of the generated sample with the Weibull pdf hist(Y, 
probability = TRUE, main = "Weibull(1, 2) random deviates") v <- seq(min(Y), 
max(Y), length = 100) lines(v, dweibull(v, 1, 2), col = "blue", lwd = 2)

# How many values of Y are > 5?
sum(Y > 5)       # should be 71

# Truncate the values larger than 5 to 5:
Y1 <- Y                 # Clone Y first
Y1[Y1 > 5] <- 5      # replace values > 5 with 5
sum(Y1 > 5)          # should be 0

# Censor values of Y greater than 5:
censor <- 1 * (Y > 5)        # 1 = Yes, 0 = No
table(Y <= 5, censor)       # cross-tabulation

With censoring, you create a companion vector to Y which contains indicators of 
whether each value of Y is censored or not according to the censoring 
criterion. The survival package, for example, uses the
Surv() function to pair a quantitative variable with a corresponding vector of 
censoring indicators, as in

Surv(Y, censor)

There are a number of good books on survival analysis floating around.
One whose datasets are wrapped up in an R package is by Klein and Moeschberger, 
who discuss in some detail the distinction between censoring and truncation. 
You can have left censoring (e.g. in environmetrics where limits of detection 
are relevant) or right censoring (more common in medical and engineering 
contexts); similarly for left and right truncation. There are several types of 
right censoring as well as interval censoring; the code above for defining the 
censoring vector corresponds to simple type I right censoring with a single, 
fixed censoring time.

Notice the absence of loops. This is because the vast majority of the functions 
in base R are vectorized, which means you can operate on the vector as a single 
entity rather than looping through each element.
This is one of the primary differences between R and conventional programming 
languages; another is the variety of ways in which to index data objects in R 
(e.g., look at the way the subset of Y1 was chosen whose values were replaced 
by 5). A reading of the relevant sections in An Introduction to R, which comes 
with the software, would appear to be worth your time.

HTH,
Dennis


On Sun, Mar 23, 2014 at 9:14 PM, Lucy Leigh <lucy.le...@newcastle.edu.au> wrote:
> Hi everyone,
>
> I am currently attempting to simulate some survival data, based on a 
> Weibull model. I basically need to simulate some survival data so that I can 
> then test out a program I'm writing, before applying it to some real data.
>
> I've managed to create failure time data, using the rweibull function. I.e.
> Y[i] <- rweibull(1, shape, scale)
> For a variety of shape and scale parameters etc.
>
> I wish to add in right censoring. If I am considering my failure times 
> as deaths, and the censoring time as a study cut-off date (for 
> example, 5 years from baseline), would it be correct to do something 
> like,
>
> for(i in 1:nSubjects){
> if (Y[i] >  5){
> Y[i] <- 5}
> else
> {Y[i]  <- Y[i]
> }}
>
> I guess my question is, is it statistically sound to impose the right 
> censoring after estimating the distribution with no censoring?
> I am leaning towards yes, because in an ideal world where we followed 
> all participants until they died, the distribution would be as 
> estimated by the rweibull above....and assuming that the right 
> censoring is independent, then cutting of the measurements at a pre-specified 
> time wouldn't affect the outcome at all. But I guess I am just looking for a 
> second opinion?
>
> Thanks in advance, this list has been immensely helpful with some 
> previous questions I have had about the survival package.
> Lucy
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to