Re: [R] Fit continuous distribution to truncated empirical values

Michele Mazzucco Mon, 14 Nov 2011 08:24:13 -0800

David,

here is the smallest dataset


# Bid   Price   Survival        Censored
0.03    0.029   1       1
0.03    0.029   11      1
0.03    0.029   10      1
0.03    0.029   9       1
0.03    0.029   8       1
0.03    0.029   7       1
0.03    0.029   6       1
0.03    0.029   5       1
0.03    0.029   4       1
0.03    0.029   3       1
0.03    0.029   2       1
0.03    0.029   1       1
0.03    0.029   1       1
0.03    0.029   1       1
0.03    0.029   9       1
0.03    0.029   8       1
0.03    0.029   7       1
0.03    0.029   6       1
0.03    0.029   5       1
0.03    0.029   4       1
0.03    0.029   3       1
0.03    0.029   2       1
0.03    0.029   1       1
0.03    0.029   16      1
0.03    0.029   15      1
0.03    0.029   14      1
0.03    0.029   13      1
0.03    0.029   12      1
0.03    0.029   11      1
0.03    0.029   10      1
0.03    0.029   9       1
0.03    0.029   8       1
0.03    0.029   7       1
0.03    0.029   6       1
0.03    0.029   5       1
0.03    0.029   4       1
0.03    0.029   3       1
0.03    0.029   2       1
0.03    0.029   1       1
0.03    0.029   6       1
0.03    0.029   5       1
0.03    0.029   4       1
0.03    0.029   3       1
0.03    0.029   2       1
0.03    0.029   1       1
0.03    0.029   7       1
0.03    0.029   6       1
0.03    0.029   5       1
0.03    0.029   4       1
0.03    0.029   3       1
0.03    0.029   2       1
0.03    0.029   1       1
0.03    0.029   5       1
0.03    0.029   4       1
0.03    0.029   3       1
0.03    0.029   2       1
0.03    0.029   1       1
0.03    0.029   4       1
0.03    0.029   3       1
0.03    0.029   2       1
0.03    0.029   1       1
0.03    0.029   6       1
0.03    0.029   5       1
0.03    0.029   4       1
0.03    0.029   3       1
0.03    0.029   2       1
0.03    0.029   1       1
0.03    0.029   30      1
0.03    0.029   29      1
0.03    0.029   28      1
0.03    0.029   27      1
0.03    0.029   26      1
0.03    0.029   25      1
0.03    0.029   24      1
0.03    0.029   23      1
0.03    0.029   22      1
0.03    0.029   21      1
0.03    0.029   20      1
0.03    0.029   19      1
0.03    0.029   18      1
0.03    0.029   17      1
0.03    0.029   16      1
0.03    0.029   15      1
0.03    0.029   14      1
0.03    0.029   13      1
0.03    0.029   12      1
0.03    0.029   11      1
0.03    0.029   10      1
0.03    0.029   9       1
0.03    0.029   8       1
0.03    0.029   7       1
0.03    0.029   6       1
0.03    0.029   5       1
0.03    0.029   4       1
0.03    0.029   3       1
0.03    0.029   2       1
0.03    0.029   1       1
0.03    0.029   8       1
0.03    0.029   7       1
0.03    0.029   6       1
0.03    0.029   5       1
0.03    0.029   4       1
0.03    0.029   3       1
0.03    0.029   2       1
0.03    0.029   1       1
0.03    0.029   4       1
0.03    0.029   3       1
0.03    0.029   2       1
0.03    0.029   1       1
0.03    0.027   71      0
0.03    0.027   70      0
0.03    0.027   69      0
0.03    0.027   68      0
0.03    0.027   67      0
0.03    0.027   66      0
0.03    0.027   65      0
0.03    0.027   64      0
0.03    0.027   63      0
0.03    0.027   62      0
0.03    0.027   61      0
0.03    0.027   60      0
0.03    0.027   59      0
0.03    0.027   58      0
0.03    0.027   57      0
0.03    0.027   56      0
0.03    0.027   55      0
0.03    0.027   54      0
0.03    0.027   53      0
0.03    0.027   52      0
0.03    0.027   51      0
0.03    0.027   50      0
0.03    0.027   49      0
0.03    0.027   48      0
0.03    0.027   47      0
0.03    0.027   46      0
0.03    0.027   45      0
0.03    0.027   44      0
0.03    0.027   43      0
0.03    0.027   42      0
0.03    0.027   41      0
0.03    0.027   40      0
0.03    0.027   39      0
0.03    0.027   38      0
0.03    0.027   37      0
0.03    0.027   36      0
0.03    0.027   35      0
0.03    0.027   34      0
0.03    0.027   33      0
0.03    0.027   32      0
0.03    0.027   31      0
0.03    0.027   30      0
0.03    0.027   29      0
0.03    0.027   28      0
0.03    0.027   27      0
0.03    0.027   26      0
0.03    0.027   25      0
0.03    0.027   24      0
0.03    0.027   23      0
0.03    0.027   22      0
0.03    0.027   21      0
0.03    0.027   20      0
0.03    0.027   19      0
0.03    0.027   18      0
0.03    0.027   17      0
0.03    0.027   16      0
0.03    0.027   15      0
0.03    0.027   14      0
0.03    0.027   13      0
0.03    0.027   12      0
0.03    0.027   11      0
0.03    0.027   10      0
0.03    0.027   9       0
0.03    0.027   8       0
0.03    0.027   7       0
0.03    0.027   6       0
0.03    0.027   5       0
0.03    0.027   4       0
0.03    0.027   3       0
0.03    0.027   2       0
0.03    0.027   1       0


This is column # 3
1 11 10  9  8  7  6  5  4  3  2  1  1  1  9  8  7  6  5  4  3  2  1 16 15 14 
13 12 11 10  9  8  7  6  5  4  3  2  1  6  5  4  3  2  1  7  6  5  4  3  2  1
5  4  3  2  1  4  3  2  1  6  5  4  3  2  1 30 29 28 27 26 25 24 23 22 21 20
19 18 17 16 15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  8  7  6  5  4  3  2
1  4  3  2  1 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 52 51
50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25
24 23 22 21 20 19 18 17 16 15 14 13 12 11 10  9  8  7  6  5  4  3  2  1


This is the script I have used for fitting the dataset to a Weibull distribution


my_plot <- function() {
        require(plotrix) # for plotCI
        require(MASS) # for fitdistr
        require(survival) # for survival analysis
        require(fitdistrplus) # for fitdist


        lifetimes=read.table("lifetime.txt", header=F, comment.char="#")
        s = Surv(lifetimes$V3, lifetimes$V4)
        mfit = survfit(s~V1, data=lifetimes, conf.type="plain")
        
        cdf=ecdf(lifetimes$V3)
        empirical = numeric(24)
        T = seq(1,24)
        for (i in T) empirical[i] = cdf(i)

        fit = fitdist(lifetimes$V3, "weibull")
        plot(T, 1- pweibull(T, fit$estimate[1], fit$estimate[2]), type="l", 
col="red", lwd=2, lty=1, xlim=c(1,24), xlab="Time [hours]", ylab="S(t)")
        lines(T, 1 - empirical, col="blue", lwd=2, lty=2)
        plotCI(T, mfit$surv[T], ui=mfit$upper[T], li=mfit$lower[T], 
col="green", lwd=1, lty=3, add=T)
        legend("bottomleft", c("Weibull fit", "1 - Empirical CDF", "Empirical 
S(t)"), col=c("red", "blue", "green"), horiz=F, lty=1:3, lwd=2)

}


my_plot()




About your questions
1 - What do you mean?, I guess I don't know the answer -- that's why I have 
posted my question on this mailing list.
2 - As I wrote in my first email, I am interested only in the first portion of 
the survival times (24 hours). Hence, I'd prefer to have a "good" fit over the 
interval of interest rather than a "reasonable" fit over the whole data.
3 - I went though those tutorials, but couldn't find an answer. That's probably 
due to the fact that those tutorials are "general", as you pointed out, while 
my question is rather specific.

Cheers,
Michele




On Nov 14, 2011, at 5:59 PM, David Winsemius wrote:

> 
> On Nov 14, 2011, at 6:11 AM, Michele Mazzucco wrote:
> 
>> Hello David,
>> 
>> thanks for your answer.
>> I have done as you told me, however the fit is very poor, much worse than 
>> that obtained from using the whole dataset (without upper bound).
>> Any idea?
> 
> Counter questions in the absence of data:
> 
> ??? Is there a reason from domain specific knowledge and theory to expect 
> that the procedure you are attempting should be correct?
> 
> ??? Why would you want to exclude data?
> 
> ??? Why are you not looking for more general tutorials (such as the one by 
> Ricci) rather than asking what is as yet an open-ended question on fitting 
> methods that surely does not admit an answer easily provided on a mailing 
> list?
> 
> -- 
> David.
> 
> 
>> 
>> Thanks,
>> Michele
>> 
>> On Nov 4, 2011, at 8:56 PM, David Winsemius wrote:
>> 
>>> 
>>> On Nov 3, 2011, at 7:54 AM, Michele Mazzucco wrote:
>>> 
>>>> Hi all,
>>>> 
>>>> I am trying to fit a distribution to some data about survival times.
>>>> I am interested only in a specific interval, e.g., while the data lies in 
>>>> the interval (0,...., 600), I want the best for the interval (0,..., 24).
>>>> 
>>>> I have tried both fitdistr (MASS package) and fitdist (from the 
>>>> fitdistrplus package), but I could not get them working, e.g.
>>>> 
>>>> fitdistr(left, "weibull", upper=24)
>>>> Error in optim(x = c(529L, 528L, 527L, 526L, 525L, 524L, 523L, 522L, 521L, 
>>>>  :
>>>> L-BFGS-B needs finite values of 'fn'
>>>> In addition: Warning message:
>>>> In dweibull(x, shape, scale, log) : NaNs produced
>>>> 
>>>> Am I doing something wrong?
>>> 
>>> You didn't supply data to test,  but shouldn't you supply a lower bound if 
>>> you want to fit "weibull"? It is, after all, bounded at 0.
>>> 
>>>> left <- c(529L, 528L, 527L, 526L, 525L, 524L, 523L, 522L, 521L, 
>>>> 50*runif(100))
>>>> fitdistr(left, "weibull", upper=24)
>>> Error in optim(x = c(529, 528, 527, 526, 525, 524, 523, 522, 521, 
>>> 18.3964251773432,  :
>>> L-BFGS-B needs finite values of 'fn'
>>> In addition: Warning message:
>>> In dweibull(x, shape, scale, log) : NaNs produced
>>> 
>>>> fitdistr(left, "weibull", upper=24, lower=0.5)
>>>    shape         scale
>>> 0.58195013   24.00000000
>>> ( 0.04046087) ( 3.38621367)
>>> 
>>>> 
>>>> 
>>>> Thanks,
>>>> Michele
>>>> 
>>>> 
>>>> p.s. I have seen similar posts, e.g., 
>>>> http://tolstoy.newcastle.edu.au/R/help/05/02/11558.html, but I am not sure 
>>>> whether I can apply the same approach here.
>>>> ______________________________________________
>>>> R-help@r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide 
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>> 
>>> David Winsemius, MD
>>> Heritage Laboratories
>>> West Hartford, CT
>>> 
>> 
> 
> David Winsemius, MD
> West Hartford, CT
> 

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fit continuous distribution to truncated empirical values

Reply via email to